Towards automatic optimization of MapReduce programs

  • Authors:
  • Shivnath Babu

  • Affiliations:
  • Duke University, Durham, NC, USA

  • Venue:
  • Proceedings of the 1st ACM symposium on Cloud computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Timely and cost-effective processing of large datasets has become a critical ingredient for the success of many academic, government, and industrial organizations. The combination of MapReduce frameworks and cloud computing is an attractive proposition for these organizations. However, even to run a single program in a MapReduce framework, a number of tuning parameters have to be set by users or system administrators. Users often run into performance problems because they don't know how to set these parameters, or because they don't even know that these parameters exist. With MapReduce being a relatively new technology, it is not easy to find qualified administrators. In this position paper, we make a case for techniques to automate the setting of tuning parameters for MapReduce programs. The objective is to provide good out-of-the-box performance for ad hoc MapReduce programs run on large datasets. This feature can go a long way towards improving the productivity of users who lack the skills to optimize programs themselves due to lack of familiarity with MapReduce or with the data being processed.