Towards optimizing hadoop provisioning in the cloud

  • Authors:
  • Karthik Kambatla;Abhinav Pathak;Himabindu Pucha

  • Affiliations:
  • Purdue University;Purdue University;IBM Research Almaden

  • Venue:
  • HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Data analytics is becoming increasingly prominent in a variety of application areas ranging from extracting business intelligence to processing data from scientific studies. MapReduce programming paradigm lends itself well to these data-intensive analytics jobs, given its ability to scale-out and leverage several machines to parallely process data. In this work we argue that such MapReduce-based analytics are particularly synergistic with the pay-as-you-go model of a cloud platform. However, a key challenge facing end-users in this environment is the ability to provision MapReduce applications to minimize the incurred cost, while obtaining the best performance. This paper firstmotivates the importance of optimally provisioning a MapReduce job, and demonstrates that existing approaches can result in far from optimal provisioning. We then present a preliminary approach that improves MapReduce provisioning by analyzing and comparing resource consumption of the application at hand with a database of similar resource consumption signatures of other applications.