Job scheduling for optimizing data locality in Hadoop clusters

Authors:
Aprigio Bezerra;Porfídio Hernández;Antonio Espinosa;Juan Carlos Moure
Affiliations:
Universitat Autonoma de Barcelona, Bellaterra, Spain;Universitat Autonoma de Barcelona, Bellaterra, Spain;Universitat Autonoma de Barcelona, Bellaterra, Spain;Universitat Autonoma de Barcelona, Bellaterra, Spain
Venue:
Proceedings of the 20th European MPI Users' Group Meeting
Year:
2013

Citing 11
Cited 0

Resource containers: a new facility for resource management in server systems

OSDI '99 Proceedings of the third symposium on Operating systems design and implementation
Fast and Practical Approximate String Matching

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
SOAP

Bioinformatics
CloudBurst

Bioinformatics
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
MOON: MapReduce On Opportunistic eNvironments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
Purlieus: locality-aware resource allocation for MapReduce in a cloud

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
ADAPT: Availability-Aware MapReduce Data Placement for Non-dedicated Distributed Computing

ICDCS '12 Proceedings of the 2012 IEEE 32nd International Conference on Distributed Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe the use of non-dedicated clusters by a known group of local applications sharing the computational resources with additional bioinformatics MapReduce applications. We have studied how to effectively use the resources shared by both application types during their execution. In order to keep local application execution times unaffected we consider the configuration of a group of parameters of the Hadoop platform. One of the most relevant aspects to consider is the job scheduling policy. Our aim is to allow that tasks from different jobs that handle the same data blocks are grouped to be run on the same node where the blocks are allocated. Experimental results show that our approach outperforms traditional policies.