A Practical Performance Model for Hadoop MapReduce

Authors:
Xuelian Lin;Zide Meng;Chuan Xu;Meng Wang
Affiliations:
-;-;-;-
Venue:
CLUSTERW '12 Proceedings of the 2012 IEEE International Conference on Cluster Computing Workshops
Year:
2012

Citing 0
Cited 2

MRPacker: an SQL to mapreduce optimizer

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Does RDMA-based enhanced Hadoop MapReduce need a new performance model?

Proceedings of the 4th annual Symposium on Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

An accurate performance model for MapReduce is increasingly important for analyzing and optimizing MapReduce jobs. It is also a precondition to implement cost-based scheduling strategies or to translate Hive like query jobs into sets of low cost MapReduce jobs. However, the multiple processing steps in MapReduce task, as well as the complexity of relationships among these steps and the difficulty to measure the computational complexity of MapReduce task, greatly challenges the development and application of a precise performance model. In this paper, we define the concept of relative computational complexity of MapReduce task to estimate the complexity of task, and illustrate the way to measure it. Then, we analyze the detail composition of MapReduce tasks and relationships among them, decompose the major cost items, and present a vector style cost model with equation to calculate each cost items. Moreover, we provide equations to estimate the task execution time based on cost vectors. The experiment on several Hadoop clusters confirms the effectiveness of our proposed performance model.