HAT: history-based auto-tuning MapReduce in heterogeneous environments

  • Authors:
  • Quan Chen;Minyi Guo;Qianni Deng;Long Zheng;Song Guo;Yao Shen

  • Affiliations:
  • Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China;Huazhong University of Science and Technology, Wuhan, China and School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan;School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan;Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In MapReduce model, a job is divided into a series of map tasks and reduce tasks. The execution time of the job is prolonged by some slow tasks seriously, especially in heterogeneous environments. To finish the slow tasks as soon as possible, current MapReduce schedulers launch a backup task on other nodes for each of the slow tasks. However, traditional MapReduce schedulers cannot detect slow tasks correctly since they cannot estimate the progress of tasks accurately (Hadoop home page http://hadoop.apache.org/ , 2011; Zaharia et al. in 8th USENIX symposium on operating systems design and implementation, ACM, New York, pp. 29---42, 2008). To solve this problem, this paper proposes a History-based Auto-Tuning (HAT) MapReduce scheduler, which calculates the progress of tasks accurately and adapts to the continuously varying environment automatically. HAT tunes the weight of each phase of a map task and a reduce task according to the value of them in history tasks and uses the accurate weights of the phases to calculate the progress of current tasks. Based on the accurate-calculated progress of tasks, HAT estimates the remaining time of tasks accurately and further launches backup tasks for the tasks that have the longest remaining time. Experimental results show that HAT can significantly improve the performance of MapReduce applications up to 37% compared with Hadoop and up to 16% compared with LATE scheduler.