Job scheduling for optimizing data locality in Hadoop clusters

  • Authors:
  • Aprigio Bezerra;Porfídio Hernández;Antonio Espinosa;Juan Carlos Moure

  • Affiliations:
  • Universitat Autonoma de Barcelona, Bellaterra, Spain;Universitat Autonoma de Barcelona, Bellaterra, Spain;Universitat Autonoma de Barcelona, Bellaterra, Spain;Universitat Autonoma de Barcelona, Bellaterra, Spain

  • Venue:
  • Proceedings of the 20th European MPI Users' Group Meeting
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe the use of non-dedicated clusters by a known group of local applications sharing the computational resources with additional bioinformatics MapReduce applications. We have studied how to effectively use the resources shared by both application types during their execution. In order to keep local application execution times unaffected we consider the configuration of a group of parameters of the Hadoop platform. One of the most relevant aspects to consider is the job scheduling policy. Our aim is to allow that tasks from different jobs that handle the same data blocks are grouped to be run on the same node where the blocks are allocated. Experimental results show that our approach outperforms traditional policies.