A Dynamic MapReduce Scheduler for Heterogeneous Workloads

Authors:
Chao Tian;Haojie Zhou;Yongqiang He;Li Zha
Affiliations:
-;-;-;-
Venue:
GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
Year:
2009

Citing 0
Cited 10

Utilization of map-reduce for parallelization of resource scheduling using MPI: PRS

Proceedings of the 2011 International Conference on Communication, Computing & Security
A Load-Driven Task Scheduler with Adaptive DSC for MapReduce

GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
Benchmarking MapReduce Implementations for Application Usage Scenarios

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
MARLA: MapReduce for Heterogeneous Clusters

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
CASH: context aware scheduler for Hadoop

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Enabling fair pricing on HPC systems with node sharing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
HAT: history-based auto-tuning MapReduce in heterogeneous environments

The Journal of Supercomputing
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

Journal of Grid Computing
A cloud-based intelligent TV program recommendation system

Computers and Electrical Engineering
Clotho: an elastic MapReduce workload/runtime co-design

Proceedings of the 12th International Workshop on Adaptive and Reflective Middleware

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a practical data center of that scale, it is a common case that I/O-bound jobs and CPU-bound jobs, which demand different resources, run simultaneously in the same cluster. In the MapReduce framework, parallelization of these two kinds of job has not been concerned. In this paper, we give a new view of the MapReduce model, and classify the MapReduce workloads into three categories based on their CPU and I/O utilization. With workload classification, we design a new dynamic MapReduce workload predict mechanism, MR-Predict, which detects the workload type on the fly. We propose a Triple-Queue Scheduler based on the MR-Predict mechanism. The Triple-Queue scheduler could improve the usage of both CPU and disk I/O resources under heterogeneous workloads. And it could improve the Hadoop throughput by about 30% under heterogeneous workloads.