PIKACHU: how to rebalance load in optimizing mapreduce on heterogeneous clusters

  • Authors:
  • Rohan Gandhi;Di Xie;Y. Charlie Hu

  • Affiliations:
  • Purdue University;Purdue University;Purdue University

  • Venue:
  • USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

For power, cost, and pricing reasons, datacenters are evolving towards heterogeneous hardware. However, MapReduce implementations, which power a representative class of datacenter applications, were originally designed for homogeneous clusters and performed poorly on heterogeneous clusters. The natural solution, rebalancing load among the reducers running on heterogeneous nodes has been explored in Tarazu, but shown to be only mildly effective. In this paper, we revisit the key design challenge in this important optimization for MapReduce on heterogeneous clusters and make three contributions. (1) We show that Tarazu estimates the target load distribution too early into MapReduce job execution, which results in the rebalanced load far from the optimal. (2) We articulate the delicate tradeoff between the estimation accuracy versus wasted work from delayed load adjustment, and propose a load rebalancing scheme that strikes a balance between the tradeoff. (3)We implement our design in the PIKACHU task scheduler, which outperforms Hadoop by up to 42% and Tarazu by up to 23%.