A Data Distribution Aware Task Scheduling Strategy for MapReduce System

Authors:
Leitao Guo;Hongwei Sun;Zhiguo Luo
Affiliations:
China Mobile Research Institute, Beijing, P.R. China 100053;China Mobile Research Institute, Beijing, P.R. China 100053;China Mobile Research Institute, Beijing, P.R. China 100053
Venue:
CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Year:
2009

Citing 2
Cited 1

The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6

Task Scheduling Algorithm for Multicore Processor System for Minimizing Recovery Time in Case of Single Node Fault

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is a parallel programming system to deal with massive data. It can automatically parallelize MapReduce jobs into multiple tasks, schedule to a cluster built by PCs. This paper describes a data distribution aware MapReduce task scheduling strategy. When worker nodes requests for tasks, it will compute and obtain nodes' priority according to the times for request, the number of tasks which can be executed locally and so on. Meanwhile, it can also calculate tasks' priority according to the numbers of copies executed by the task, latency time of tasks and so on. This strategy is based on node and task's scheduling priority, fully considers data distribution in the system and thus schedules Map tasks to nodes having data in high probability, to reduce network overhead and improve system efficiency.