Evolving toward the perfect schedule: co-scheduling job assignments and data replication in wide-area systems using a genetic algorithm

  • Authors:
  • Thomas Phan;Kavitha Ranganathan;Radu Sion

  • Affiliations:
  • IBM Almaden Research Center;IBM T.J. Watson Research Center;Stony Brook University

  • Venue:
  • JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

Traditional job schedulers for grid or cluster systems are responsible for assigning incoming jobs to compute nodes in such a way that some evaluative condition is met. Such systems generally take into consideration the availability of compute cycles, queue lengths, and expected job execution times, but they typically do not account directly for data staging and thus miss significant associated opportunities for optimisation. Intuitively, a tighter integration of job scheduling and automated data replication can yield significant advantages due to the potential for optimised, faster access to data and decreased overall execution time. In this paper we consider data placement as a first-class citizen in scheduling and use an optimisation heuristic for generating schedules. We make the following two contributions. First, we identify the necessity for co-scheduling job dispatching and data replication assignments and posit that simultaneously scheduling both is critical for achieving good makespans. Second, we show that deploying a genetic search algorithm to solve the optimal allocation problem has the potential to achieve significant speed-up results versus traditional allocation mechanisms. Through simulation, we show that our algorithm provides on average an approximately 20-45% faster makespan than greedy schedulers.