New worker-centric scheduling strategies for data-intensive grid applications

Authors:
Steven Y. Ko;Ramsés Morales;Indranil Gupta
Affiliations:
University of Illinois, Urbana-Champaign, Urbana, IL;University of Illinois, Urbana-Champaign, Urbana, IL;University of Illinois, Urbana-Champaign, Urbana, IL
Venue:
Proceedings of the ACM/IFIP/USENIX 2007 International Conference on Middleware
Year:
2007

Citing 9
Cited 3

The AppLeS parameter sweep template: user-level middleware for the grid

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Scheduling Distributed Applications: the SimGrid Simulation Framework

CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments

HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications

HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Design and Analysis of a Dynamic Scheduling Strategy with Resource Estimation for Large-Scale Grid Systems

GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
The Grid2003 Production Grid: Principles and Practice

HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Guidelines for Scheduling Some Common Computation-Dags for Internet-Based Computing

IEEE Transactions on Computers
Scheduling: Theory, Algorithms, and Systems

Scheduling: Theory, Algorithms, and Systems
Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing

Efficient on-demand operations in dynamic distributed infrastructures

LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
FIRE: A File Reunion Based Data Replication Strategy for Data Grids

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
PFRF: An adaptive data replication algorithm based on star-topology data grids

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Distributed computations, dealing with large amounts of data, are scheduled in Grid clusters today using either a task-centric mechanism, or a worker-centric mechanism. Because of the large data sets, the execution time is bounded by the cost of data transfer. In this paper, we introduce new worker-centric scheduling strategies that are novel in that they aim to implicitly exploit the locality of interest in order to reduce the cost of data transfer. Many Grid applications are characterized by such a locality of interest, i.e., a file is often accessed by multiple tasks and, more importantly, a set of files that are accessed by one task are also likely to be accessed together by other tasks. Our new deterministic, as well as probabilistic, scheduling algorithms implicitly exploit this feature to improve running time. Our experiments are done with traces of a real Grid application (Coadd), and show that our algorithms are able to achieve utilization of over 90%, while reducing makespan significantly compared to task-centric approaches.