Distributed Scheduling Extension on Hadoop

Authors:
Zeng Dadan;Wang Xieqin;Jiang Ningkang
Affiliations:
Software Engineering Institute, East China Normal University,;Software Engineering Institute, East China Normal University,;Software Engineering Institute, East China Normal University,
Venue:
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Year:
2009

Citing 3
Cited 0

Google's MapReduce programming model – Revisited

Science of Computer Programming
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.01

Visualization

Abstract

Distributed computing splits a large-scale job into multiple tasks and deals with them on clusters. Cluster resource allocation is the key point to restrict the efficiency of distributed computing platform. Hadoop is the current most popular open-source distributed platform. However, the existing scheduling strategies in Hadoop are kind of simple and cannot meet the needs such as sharing the cluster for multi-user, ensuring a concept of guaranteed capacity for each job, as well as providing good performance for interactive jobs. This paper researches the existing scheduling strategies, analyses the inadequacy and adds three new features in Hadoop which can raise the weight of job temporarily, grab cluster resources by higher-priority jobs and support the computing resources share among multi-user. Experiments show they can help in providing better performance for interactive jobs, as well as more fairly share of computing time among users.