The AppLeS parameter sweep template: user-level middleware for the grid
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Scheduling Distributed Applications: the SimGrid Simulation Framework
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Heuristics for Scheduling Parameter Sweep Applications in Grid Environments
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
The Grid2003 Production Grid: Principles and Practice
HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Guidelines for Scheduling Some Common Computation-Dags for Internet-Based Computing
IEEE Transactions on Computers
Scheduling: Theory, Algorithms, and Systems
Scheduling: Theory, Algorithms, and Systems
Exploiting replication and data reuse to efficiently schedule data-intensive applications on grids
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Efficient on-demand operations in dynamic distributed infrastructures
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
FIRE: A File Reunion Based Data Replication Strategy for Data Grids
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
PFRF: An adaptive data replication algorithm based on star-topology data grids
Future Generation Computer Systems
Hi-index | 0.00 |
Distributed computations, dealing with large amounts of data, are scheduled in Grid clusters today using either a task-centric mechanism, or a worker-centric mechanism. Because of the large data sets, the execution time is bounded by the cost of data transfer. In this paper, we introduce new worker-centric scheduling strategies that are novel in that they aim to implicitly exploit the locality of interest in order to reduce the cost of data transfer. Many Grid applications are characterized by such a locality of interest, i.e., a file is often accessed by multiple tasks and, more importantly, a set of files that are accessed by one task are also likely to be accessed together by other tasks. Our new deterministic, as well as probabilistic, scheduling algorithms implicitly exploit this feature to improve running time. Our experiments are done with traces of a real Grid application (Coadd), and show that our algorithms are able to achieve utilization of over 90%, while reducing makespan significantly compared to task-centric approaches.