BARC/EADS Talk by Matteo Ceccarello


Making Big Data Small.


A general trend in the history of computing has been the continuous growth of the amount of data that needs to be processed. In particular, since the invention of the World Wide Web we have witnessed a constant acceleration of this growth, leading to the terabyte-sized datasets we have to deal with today, generally referred to as "Big Data".

To process these large volumes of data in a timely fashion, several models of computation have been developed, in particular the Streaming and MapReduce models. These models capture the limitations of those computational infrastructures where the memory available to each node is significantly smaller than the amount of data to be processed.

In this talk, after introducing the MapReduce and Streaming models, I will describe a general strategy based on clustering which enables the development of efficient approximation algorithms for both models. We will then see a number of prominent case study problems where this strategy yields accurate and efficient algorithms both in theory and in practice.


Matteo received his PhD in 2017 from the University of Padova, Italy, under the supervision of professor Andrea Pietracaprina, and during 2017 he has been a postdoc in the same university. During his PhD, he has been a visiting scholar for one semester at Brown University, USA, in the group of Professor Eli Upfal.

Matteo’s main research interest is big data algorithms in the parallel and distributed setting (e.g. MapReduce), in particular the usage of clustering as a tool to develop practical and efficient algorithms with provable guarantees for large scale problems.