Tarazu: optimizing MapReduce on heterogeneous clusters

  • Authors:
  • Faraz Ahmad;Srimat T. Chakradhar;Anand Raghunathan;T. N. Vijaykumar

  • Affiliations:
  • Purdue University, West Lafayette, IN, USA;NEC Laboratories America, Princeton, NJ, USA;Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA

  • Venue:
  • ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data center-scale clusters are evolving towards heterogeneous hardware for power, cost, differentiated price-performance, and other reasons. MapReduce is a well-known programming model to process large amount of data on data center-scale clusters. Most MapReduce implementations have been designed and optimized for homogeneous clusters. Unfortunately, these implementations perform poorly on heterogeneous clusters (e.g., on a 90-node cluster that contains 10 Xeon-based servers and 80 Atom-based servers, Hadoop performs worse than on 10-node Xeon-only or 80-node Atom-only homogeneous sub-clusters for many of our benchmarks). This poor performance remains despite previously proposed optimizations related to management of straggler tasks. In this paper, we address MapReduce's poor performance on heterogeneous clusters. Our first contribution is that the poor performance is due to two key factors: (1) the non-intuitive effect that MapReduce's built-in load balancing results in excessive and bursty network communication during the Map phase, and (2) the intuitive effect that the heterogeneity amplifies load imbalance in the Reduce computation. Our second contribution is Tarazu, a suite of optimizations to improve MapReduce performance on heterogeneous clusters. Tarazu consists of (1) Communication-Aware Load Balancing of Map computation (CALB) across the nodes, (2) Communication-Aware Scheduling of Map computation (CAS) to avoid bursty network traffic and (3) Predictive Load Balancing of Reduce computation (PLB) across the nodes. Using the above 90-node cluster, we show that Tarazu significantly improves performance over a baseline of Hadoop with straightforward tuning for hardware heterogeneity.