MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
CLOUDLET: towards mapreduce implementation on virtual machines
Proceedings of the 18th ACM international symposium on High performance distributed computing
A Dynamic MapReduce Scheduler for Heterogeneous Workloads
GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
Hadoop high availability through metadata replication
Proceedings of the first international workshop on Cloud data management
Accelerating MapReduce with Distributed Memory Cache
ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
Towards automatic optimization of MapReduce programs
Proceedings of the 1st ACM symposium on Cloud computing
Towards optimizing hadoop provisioning in the cloud
HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
With the rapid development of internet applications, more and more network service and commercial applications are deployed to cloud computing environment, with petabytes of data to be processed. MapReduce is one of the most famous solutions for large-scale data processing. This paper focuses on optimizing the scheduler of MapReduce framework in task level. We care about the hardware configuration and real-time workload of the nodes in a hadoop cluster and aim at shortening time cost of MapReduce jobs and improving hardware resource utilization rate. We put forward a load-driven task scheduler which assigns tasks to Task Trackers according to the workload of slave nodes. It is based on a Dynamic Slot Controller (DSC) that can adjust Map task Slot (MS) and Reduce task Slot (RS) of Task Trackers running on slave nodes adaptively. Our load-driven task scheduler can shorten time consumption of MapReduce job by 14% and improve the CPU utilization rate of hadoop cluster by 34% when processing 10GB data.