An SCP-based heuristic approach for scheduling distributed data-intensive applications on global grids

  • Authors:
  • Srikumar Venugopal;Rajkumar Buyya

  • Affiliations:
  • Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, VIC 3010, Australia;Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, VIC 3010, Australia

  • Venue:
  • Journal of Parallel and Distributed Computing
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

Data-intensive Grid applications need access to large data sets that may each be replicated on different resources. Minimizing the overhead of transferring these data sets to the resources where the applications are executed requires that appropriate computational and data resources be selected. In this paper, we consider the problem of scheduling an application composed of a set of independent tasks, each of which requires multiple data sets that are each replicated on multiple resources. We break this problem into two parts: one, to match each task (or job) to one compute resource for executing the job and one storage resource each for accessing each data set required by the job and two, to assign the set of tasks to the selected resources. We model the first part as an instance of the well-known Set Covering Problem (SCP) and apply a known heuristic for SCP to match jobs to resources. The second part is tackled by extending existing MinMin and Sufferage algorithms to schedule the set of distributed data-intensive tasks. Through simulation, we experimentally compare the SCP-based matching heuristic to others in conjunction with the task scheduling algorithms and present the results.