HAT: history-based auto-tuning MapReduce in heterogeneous environments

Authors:
Quan Chen;Minyi Guo;Qianni Deng;Long Zheng;Song Guo;Yao Shen
Affiliations:
Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China;Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China;Huazhong University of Science and Technology, Wuhan, China and School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan;School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Japan;Department of Computer Science, Shanghai Jiao Tong University, Shanghai, China
Venue:
The Journal of Supercomputing
Year:
2013

Citing 24
Cited 1

Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
A break in the clouds: towards a cloud definition

ACM SIGCOMM Computer Communication Review
Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility

Future Generation Computer Systems
CloudBurst

Bioinformatics
CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
A Dynamic MapReduce Scheduler for Heterogeneous Workloads

GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
Packing the most onto your cloud

Proceedings of the first international workshop on Cloud data management
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
MapReduce System over Heterogeneous Mobile Devices

SEUS '09 Proceedings of the 7th IFIP WG 10.2 International Workshop on Software Technologies for Embedded and Ubiquitous Systems
FPMR: MapReduce framework on FPGA

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
ParaTimer: a progress indicator for MapReduce DAGs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Assigning tasks for efficiency in Hadoop: extended abstract

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
A Map-Reduce System with an Alternate API for Multi-core Environments

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
MapReduce for the cell broadband engine architecture

IBM Journal of Research and Development
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Performance Management of Accelerated MapReduce Workloads in Heterogeneous Clusters

ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
Dynamic proportional share scheduling in Hadoop

JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Mars: Accelerating MapReduce with Graphics Processors

IEEE Transactions on Parallel and Distributed Systems

Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In MapReduce model, a job is divided into a series of map tasks and reduce tasks. The execution time of the job is prolonged by some slow tasks seriously, especially in heterogeneous environments. To finish the slow tasks as soon as possible, current MapReduce schedulers launch a backup task on other nodes for each of the slow tasks. However, traditional MapReduce schedulers cannot detect slow tasks correctly since they cannot estimate the progress of tasks accurately (Hadoop home page http://hadoop.apache.org/ , 2011; Zaharia et al. in 8th USENIX symposium on operating systems design and implementation, ACM, New York, pp. 29---42, 2008). To solve this problem, this paper proposes a History-based Auto-Tuning (HAT) MapReduce scheduler, which calculates the progress of tasks accurately and adapts to the continuously varying environment automatically. HAT tunes the weight of each phase of a map task and a reduce task according to the value of them in history tasks and uses the accurate weights of the phases to calculate the progress of current tasks. Based on the accurate-calculated progress of tasks, HAT estimates the remaining time of tasks accurately and further launches backup tasks for the tasks that have the longest remaining time. Experimental results show that HAT can significantly improve the performance of MapReduce applications up to 37% compared with Hadoop and up to 16% compared with LATE scheduler.