Towards quantitative analysis of data intensive computing: a case study of Hadoop

  • Authors:
  • Peng Wang;Dan Meng;Zhaoxia Han;Xu Liu

  • Affiliations:
  • Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China;Rice University, Houston, USA

  • Venue:
  • Proceedings of the 8th ACM international conference on Autonomic computing
  • Year:
  • 2011

Quantified Score

Hi-index 0.01

Visualization

Abstract

In modern data centers, Hadoop has been widely used in perform data-intensive computation. Administrators of large scale hadoop clusters leverage statistical data collected at runtime to measure the efficiency of the cluster utilization. In this paper, we propose three statistical metrics - data locality ratio, load balance coefficient and access balance coefficient to quantify performance losses in data intensive applications. We evaluated our metrics using a large scale web click stream application running on a productive hadoop cluster at Tencent Inc.