Designing good algorithms for MapReduce and beyond

  • Authors:
  • Foto N. Afrati;Magdalena Balazinska;Anish Das Sarma;Bill Howe;Semih Salihoglu;Jeffrey D. Ullman

  • Affiliations:
  • Google;University of Washington;Google;University of Washington;Stanford University;Stanford University

  • Venue:
  • Proceedings of the Third ACM Symposium on Cloud Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

As MapReduce/Hadoop grows in importance, we find more exotic applications being written this way. Not every program written for this platform performs as well as we might wish. There are several reasons why a MapReduce program can underperform expectations. One is the need to balance the communication cost of transporting data from the mappers to the reducers against the computation done at the mappers and reducers themselves. A second important issue is selecting the number of rounds of MapReduce. A third issue is that of skew. If wall-clock time is important, then using many different reduce-keys and many compute nodes may minimize the time to finish the job. Yet if the data is uncooperative, and no provision is made to distribute the data evenly, much of the work is done by a single node.