Gunther: search-based auto-tuning of mapreduce

  • Authors:
  • Guangdeng Liao;Kushal Datta;Theodore L. Willke

  • Affiliations:
  • Intel Labs, Hillsboro, Oregon;Intel Labs, Hillsboro, Oregon;Intel Labs, Hillsboro, Oregon

  • Venue:
  • Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce has emerged as a very popular programming model for large-scale data analytics. Despite its industry-wide acceptance, the open source ApacheTM HadoopTM framework for MapReduce remains difficult to optimize, particularly in large-scale production environments. The vast search space defined by the hundreds of MapReduce configuration parameters and the complex interactions between them makes it time consuming to rely on manual tuning. Hence something more is needed. In this paper we evaluate approaches to the automatic tuning of Hadoop MapReduce including ones based on cost-based and machine learning models. We determine that they are inadequate and instead propose a search-based approach called Gunther for Hadoop MapReduce optimization. Gunther uses a Genetic Algorithm which is specially designed to aggressively identify parameter settings that result in near-optimal job execution time. We evaluate Gunther on two types of clusters with different resource characteristics. Our experiments demonstrate that Gunther can obtain near-optimal performance within a small number of trials (