Automatic task slots assignment in Hadoop MapReduce

Authors:
Kun Wang;Ben Tan;Juwei Shi;Bo Yang
Affiliations:
Peking University;IBM Research - China;IBM Research - China;IBM Research - China
Venue:
Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Year:
2011

Citing 6
Cited 0

MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Towards automatic optimization of MapReduce programs

Proceedings of the 1st ACM symposium on Cloud computing
X-RIME: Cloud-Based Large Scale Social Network Analysis

SCC '10 Proceedings of the 2010 IEEE International Conference on Services Computing
Towards optimizing hadoop provisioning in the cloud

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we address the problem caused by fixed assignment of task slots in Hadoop MapReduce. It is infeasible to manually configure optimal task slots since the characteristics of various workloads are different. We design and implement an automatic control mechanism to dynamically assign task slots based on the resource utilization on each Task Tracker node. The assignment takes the lag period into account. It can improve the cluster-wide resource utilization and avoid contention. Experimental results show that our implementation can dynamically adjust the task slots capacity to the optimal setting in runtime. In some case such as Word Count, our control mechanism outperforms the current Hadoop with optimal task slots configuration found by manual tuning.