Adaptive MapReduce using situation-aware mappers

  • Authors:
  • Rares Vernica;Andrey Balmin;Kevin S. Beyer;Vuk Ercegovac

  • Affiliations:
  • Hewlett-Packard Laboratories, Palo Alto, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA

  • Venue:
  • Proceedings of the 15th International Conference on Extending Database Technology
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose new adaptive runtime techniques for MapReduce that improve performance and simplify job tuning. We implement these techniques by breaking a key assumption of MapReduce that mappers run in isolation. Instead, our mappers communicate through a distributed meta-data store and are aware of the global state of the job. However, we still preserve the fault-tolerance, scalability, and programming API of MapReduce. We utilize these "situation-aware mappers" to develop a set of techniques that make MapReduce more dynamic: (a) Adaptive Mappers dynamically take multiple data partitions (splits) to amortize mapper start-up costs; (b) Adaptive Combiners improve local aggregation by maintaining a cache of partial aggregates for the frequent keys; (c) Adaptive Sampling and Partitioning sample the mapper outputs and use the obtained statistics to produce balanced partitions for the reducers. Our experimental evaluation shows that adaptive techniques provide up to 3x performance improvement, in some cases, and dramatically improve performance stability across the board.