Parallel Job Scheduling: Issues and Approaches
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Job Scheduling Under the Portable Batch System
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Metrics for Parallel Job Scheduling and Their Convergence
JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Effective Metacomputing using LSF MultiCluster
CCGRID '01 Proceedings of the 1st International Symposium on Cluster Computing and the Grid
Matchmaking: Distributed Resource Management for High Throughput Computing
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Design and Evaluation of a Resource Selection Framework for Grid Applications
HPDC '02 Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing
Policy Driven Heterogeneous Resource Co-Allocation with Gangmatching
HPDC '03 Proceedings of the 12th IEEE International Symposium on High Performance Distributed Computing
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Fuzzy based resource management framework for high throughput computing
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
Online resource matching for heterogeneous grid environments
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation
ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Multi-armed bandit algorithms and empirical evaluation
ECML'05 Proceedings of the 16th European conference on Machine Learning
Enhancing Prediction on Non-dedicated Clusters
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Resource provisioning in SLA-based cluster computing
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Hi-index | 0.00 |
Heterogeneous clusters and grid infrastructures are becoming increasingly popular. In these computing infrastructures, machines have different resources, including memory sizes, disk space, and installed software packages. These differences give rise to a problem of over-provisioning, that is, sub-optimal utilization of a cluster due to users requesting resource capacities greater than what their jobs actually need. Our analysis of a real workload file (LANL CM5) revealed differences of up to two orders of magnitude between requested memory capacity and actual memory usage. This paper presents an algorithm to estimate actual resource capacities used by batch jobs. Such an algorithm reduces the need for users to correctly predict the resources required by their jobs, while at the same time managing the scheduling system to obtain superior utilization of available hardware. The algorithm is based on the Reinforcement Learning paradigm; it learns its estimation policy on-line and dynamically modifies it according to the overall cluster load. The paper includes simulation results which indicate that our algorithm can yield an improvement of over 30% in utilization (overall throughput) of heterogeneous clusters.