Dynamic split model of resource utilization in MapReduce

  • Authors:
  • Xiao Wei Wang;Jie Zhang;Hua Ming Liao;Li Zha

  • Affiliations:
  • Institute of Computing Technology Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology Chinese Academy of Sciences, Beijing, China;Institute of Computing Technology Chinese Academy of Sciences, Beijing, China

  • Venue:
  • Proceedings of the second international workshop on Data intensive computing in the clouds
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce is gaining increasing popularity as a parallel programming model for large-scale data processing. We find however some traditional MapReduce platforms have a poor performance in terms of cluster resource utilization since the traditional multi-phase parallel model and some existing schedule policies used in the cluster environment have some drawbacks. We address these problems through our experience in designing a Dynamic Split Model of the resources utilization which contains two technologies, Dynamic Resource Allocation considering the phase priority as well as job requirement when allocating resources and Resource Usage Pipeline which can assign tasks dynamically. We verify our optimization on top of Hadoop and the results show that these technologies can improve the throughput by 21.72%, the average wall time gain is 55.83%. And we improve the percentage of user CPU utilization by 12.93%, reduce the percentage of iowait CPU and idle CPU utilization by 6.61% and 6.73%. The upstream speed and downstream speed are increased by 11.3% and 23.5%. What's more, we have relieved the Disk I/O bottleneck by 30.3%.