Tarazu: optimizing MapReduce on heterogeneous clusters
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
MARLA: MapReduce for Heterogeneous Clusters
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
CASH: context aware scheduler for Hadoop
Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process
International Journal of Intelligent Systems
Cogset: a high performance MapReduce engine
Concurrency and Computation: Practice & Experience
A cloud-based intelligent TV program recommendation system
Computers and Electrical Engineering
MapReduce "garbage" collection
CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
Hi-index | 0.00 |
Hadoop is seriously limited by its MapReduce scheduler which does not scale well in heterogeneous environment. Heterogenous environment is characterized by various devices which vary greatly with respect to the capacities of computation and communication, architectures, memorizes and power. As an important extension of Hadoop, LATE MapReduce scheduling algorithm takes heterogeneous environment into consideration. However, it falls short of solving the crucial problem – poor performance due to the static manner in which it computes progress of tasks. Consequently, neither Hadoop nor LATE schedulers are desirable in heterogeneous environment. To this end, we propose SAMR: a Self-Adaptive MapReduce scheduling algorithm, which calculates progress of tasks dynamically and adapts to the continuously varying environment automatically. When a job is committed, SAMR splits the job into lots of fine-grained map and reduce tasks, then assigns them to a series of nodes. Meanwhile, it reads historical information which stored on every node and updated after every execution. Then, SAMR adjusts time weight of each stage of map and reduce tasks according to the historical information respectively. Thus, it gets the progress of each task accurately and finds which tasks need backup tasks. What’s more, it identifies slow nodes and classifies them into the sets of slow nodes dynamically. According to the information of these slow nodes, SAMR will not launch backup tasks on them, ensuring the backup tasks will not be slow tasks any more. It gets the final results of the fine-grained tasks when either slow tasks or backup tasks finish first. The proposed algorithm is evaluated by extensive experiments over various heterogeneous environment. Experimental results show that SAMR significantly decreases the time of execution up to 25% compared with Hadoop’s scheduler and up to 14% compared with LATE scheduler.