Future Generation Computer Systems - Special issue on metacomputing
Identifying Dynamic Replication Strategies for a High-Performance Data Grid
GRID '01 Proceedings of the Second International Workshop on Grid Computing
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
An evaluation of the close-to-files processor and data co-allocation policy in multiclusters
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Job scheduling and data replication on data grids
Future Generation Computer Systems
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Scheduling shared scans of large data files
Proceedings of the VLDB Endowment
Assignment Problems
Data-intensive text processing with MapReduce
NAACL-Tutorials '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts
Quincy: fair scheduling for distributed computing clusters
Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
Cloud technologies for bioinformatics applications
Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
An Analysis of Traces from a Production MapReduce Cluster
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Dominant resource fairness: fair allocation of multiple resource types
Proceedings of the 8th USENIX conference on Networked systems design and implementation
Purlieus: locality-aware resource allocation for MapReduce in a cloud
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Investigation of Data Locality in MapReduce
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Hi-index | 0.00 |
In data-intensive computing, MapReduce is an important tool that allows users to process large amounts of data easily. Its data locality aware scheduling strategy exploits the locality of data accessing to minimize data movement and thus reduce network traffic. In this paper, we firstly analyze the state-of-the-art MapReduce scheduling algorithms and demonstrate that optimal scheduling is not guaranteed. After that, we mathematically reformulate the scheduling problem by using a cost matrix to capture the cost of data staging and propose an algorithm lsap-sched that yields optimal data locality. In addition, we integrate fairness and data locality into a unified algorithm lsap-fair-sched in which users can easily adjust the tradeoffs between data locality and fairness. At last, extensive simulation experiments are conducted to show that our algorithms can improve the ratio of data local tasks by up to 14%, reduce data movement cost by up to 90%, and balance fairness and data locality effectively.