Analysis of the Effects of Delays on Load Sharing
IEEE Transactions on Computers
The working set model for program behavior
Communications of the ACM
On paging with locality of reference
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
SIAM Journal on Computing
Resource Allocation in the Grid Using Reinforcement Learning
AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
The Supercomputer Industry in Light of the Top500 Data
Computing in Science and Engineering
Utility Functions in Autonomic Systems
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Tree-Based Batch Mode Reinforcement Learning
The Journal of Machine Learning Research
Is 99% utilization of a supercomputer a good thing?
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Online resource allocation using decompositional reinforcement learning
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Adaptive load balancing: a study in multi-agent learning
Journal of Artificial Intelligence Research
Improvement of systems management policies using hybrid reinforcement learning
ECML'06 Proceedings of the 17th European conference on Machine Learning
Reinforcement learning based resource allocation in business process management
Data & Knowledge Engineering
URL: A unified reinforcement learning approach for autonomic cloud management
Journal of Parallel and Distributed Computing
Distributed dynamic data driven prediction based on reinforcement learning approach
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
Computer grids are complex, heterogeneous, and dynamic systems, whose behavior is governed by hundreds of manually-tuned parameters. As the complexity of these systems grows, automating the procedure of parameter tuning becomes indispensable. In this paper, we consider the problem of auto-tuning server capacity, i.e. the number of jobs a server runs in parallel. We present three different reinforcement learning algorithms, which generate a dynamic policy by changing the number of concurrent running jobs according to the job types and machine state. The algorithms outperform manually-tuned policies for the entire range of checked workloads, with average throughput improvement greater than 20%. On multi-core servers, the average throughput improvement is approximately 40%, which hints at the enormous improvement potential of such a tuning mechanism with the gradual transition to multi-core machines.