An SCP-based heuristic approach for scheduling distributed data-intensive applications on global grids

Authors:
Srikumar Venugopal;Rajkumar Buyya
Affiliations:
Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, VIC 3010, Australia;Grid Computing and Distributed Systems (GRIDS) Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, VIC 3010, Australia
Venue:
Journal of Parallel and Distributed Computing
Year:
2008

Citing 35
Cited 7

Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Tuning the performance of I/O-intensive parallel applications

Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference
Heuristics for Scheduling I/O Operations

IEEE Transactions on Parallel and Distributed Systems
The grid: blueprint for a new computing infrastructure

The grid: blueprint for a new computing infrastructure
The network weather service: a distributed resource performance forecasting service for metacomputing

Future Generation Computer Systems - Special issue on metacomputing
Dynamic mapping of a class of independent tasks onto heterogeneous computing systems

Journal of Parallel and Distributed Computing - Special issue on software support for distributed computing
Static scheduling algorithms for allocating directed task graphs to multiprocessors

ACM Computing Surveys (CSUR)
The MONARC toolset for simulating large network-distributed processing systems

Proceedings of the 32nd conference on Winter simulation
A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems

Journal of Parallel and Distributed Computing
Introduction to Algorithms

Introduction to Algorithms
Optimal Task Assignment in Heterogeneous Distributed Computing Systems

IEEE Concurrency
Passion: Optimized I/O for Parallel Applications

Computer
Simulation of Dynamic Grid Replication Strategies in OptorSim

GRID '02 Proceedings of the Third International Workshop on Grid Computing
Disk Striping

Proceedings of the Second International Conference on Data Engineering
GridLab: a grid application toolkit and testbed

Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
Simgrid: A Toolkit for the Simulation of Application Scheduling

CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Chameleon: A Resource Scheduler in A Data Grid Environment

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Dynamic Matching and Scheduling of a Class of Independent Tasks onto Heterogeneous Computing Systems

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments

HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
MySRB & SRB: Components of a Data Grid

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
On the relationship between file sizes, transport protocols, and self-similar network traffic

ICNP '96 Proceedings of the 1996 International Conference on Network Protocols (ICNP '96)
Evaluating Scheduling and Replica Optimisation Strategies in OptorSim

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
A grid service broker for scheduling distributed data-oriented applications on global grids

MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
A taxonomy of computer-based simulations and its mapping to parallel and distributed systems simulation tools

Software—Practice & Experience
Parallel and Distributed Astronomical Data Analysis on Grid Datafarm

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
The Grid2003 Production Grid: Principles and Practice

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Graph theory: An algorithmic approach (Computer science and applied mathematics)

Graph theory: An algorithmic approach (Computer science and applied mathematics)
An evaluation of the close-to-files processor and data co-allocation policy in multiclusters

CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Scheduling workflow applications on processors with different capabilities

Future Generation Computer Systems - Collaborative and learning applications of grid technology
Task scheduling strategies for workflow-based applications in grids

CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O

CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
Selfish grid computing: game-theoretic modeling and NAS performance results

CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
Non-cooperative, semi-cooperative, and cooperative games-based grid resource allocation

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A deadline and budget constrained scheduling algorithm for escience applications on data grids

ICA3PP'05 Proceedings of the 6th international conference on Algorithms and Architectures for Parallel Processing

Queuing model based on scheduling strategies affect local network services

CIS'09 Proceedings of the international conference on Computational and information science 2009
A data placement strategy in scientific cloud workflows

Future Generation Computer Systems
Network-aware meta-scheduling in advance with autonomous self-tuning system

Future Generation Computer Systems
A survey on grid task scheduling

International Journal of Computer Applications in Technology
Resource scheduling methods for query optimization in data grid systems

ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
A PTS-PGATS based approach for data-intensive scheduling in data grids

Frontiers of Computer Science in China
Swarm scheduling approaches for work-flow applications with security constraints in distributed data-intensive computing environments

Information Sciences: an International Journal

Quantified Score

Hi-index	0.01

Visualization

Abstract

Data-intensive Grid applications need access to large data sets that may each be replicated on different resources. Minimizing the overhead of transferring these data sets to the resources where the applications are executed requires that appropriate computational and data resources be selected. In this paper, we consider the problem of scheduling an application composed of a set of independent tasks, each of which requires multiple data sets that are each replicated on multiple resources. We break this problem into two parts: one, to match each task (or job) to one compute resource for executing the job and one storage resource each for accessing each data set required by the job and two, to assign the set of tasks to the selected resources. We model the first part as an instance of the well-known Set Covering Problem (SCP) and apply a known heuristic for SCP to match jobs to resources. The second part is tackled by extending existing MinMin and Sufferage algorithms to schedule the set of distributed data-intensive tasks. Through simulation, we experimentally compare the SCP-based matching heuristic to others in conjunction with the task scheduling algorithms and present the results.