Serverless network file systems
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
The working set model for program behavior
Communications of the ACM
The ANL/IBM SP Scheduling System
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Giggle: a framework for constructing scalable replica location services
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Evaluation of an Economy-Based File Replication Strategy for a Data Grid
CCGRID '03 Proceedings of the 3st International Symposium on Cluster Computing and the Grid
Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Pipeline and Batch Sharing in Grid Workloads
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Simulation of Dynamic Data Replication Strategies in Data Grids
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Gang Scheduling with Memory Considerations
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Stork: Making Data Placement a First Class Citizen in the Grid
ICDCS '04 Proceedings of the 24th International Conference on Distributed Computing Systems (ICDCS'04)
The Anatomy of the Grid: Enabling Scalable Virtual Organizations
International Journal of High Performance Computing Applications
Co-scheduling of computation and data on computer clusters
SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Data-driven batch scheduling
Virtualized HPC: a contradiction in terms?
Software—Practice & Experience
Hi-index | 0.00 |
In this paper, we develop data-driven strategies for batch computing schedulers. Current CPU-centric batch schedulers ignore the data needs within workloads and execute them by linking them transparently and directly to their needed data. When scheduled on remote computational resources, this elegant solution of direct data access can incur an order of magnitude performance penalty for data-intensive workloads. Adding data-awareness to batch schedulers allows a careful coordination of data and CPU allocation thereby reducing the cost of remote execution. We offer here new techniques by which batch schedulers can become data-driven. Such systems can use our analytical predictive models to select one of the four data-driven scheduling policies that we have created. Through simulation, we demonstrate the accuracy of our predictive models and show how they can reduce time to completion for some workloads by as much as 80%.