A Load-Driven Task Scheduler with Adaptive DSC for MapReduce

Authors:
Hong Mao;Shengqiu Hu;Zhenzhong Zhang;Limin Xiao;Li Ruan
Affiliations:
-;-;-;-;-
Venue:
GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
Year:
2011

Citing 9
Cited 1

MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
CLOUDLET: towards mapreduce implementation on virtual machines

Proceedings of the 18th ACM international symposium on High performance distributed computing
A Dynamic MapReduce Scheduler for Heterogeneous Workloads

GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
Hadoop high availability through metadata replication

Proceedings of the first international workshop on Cloud data management
Accelerating MapReduce with Distributed Memory Cache

ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
Towards automatic optimization of MapReduce programs

Proceedings of the 1st ACM symposium on Cloud computing
Towards optimizing hadoop provisioning in the cloud

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation

SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the rapid development of internet applications, more and more network service and commercial applications are deployed to cloud computing environment, with petabytes of data to be processed. MapReduce is one of the most famous solutions for large-scale data processing. This paper focuses on optimizing the scheduler of MapReduce framework in task level. We care about the hardware configuration and real-time workload of the nodes in a hadoop cluster and aim at shortening time cost of MapReduce jobs and improving hardware resource utilization rate. We put forward a load-driven task scheduler which assigns tasks to Task Trackers according to the workload of slave nodes. It is based on a Dynamic Slot Controller (DSC) that can adjust Map task Slot (MS) and Reduce task Slot (RS) of Task Trackers running on slave nodes adaptively. Our load-driven task scheduler can shorten time consumption of MapReduce job by 14% and improve the CPU utilization rate of hadoop cluster by 34% when processing 10GB data.