Workstation capacity tuning using reinforcement learning

Authors:
Aharon Bar-Hillel;Amir Di-Nur;Liat Ein-Dor;Ran Gilad-Bachrach;Yossi Ittach
Affiliations:
Intel Research Israel;Intel Inc.;Intel Research Israel;Intel Research Israel;Intel Research Israel
Venue:
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Year:
2007

Citing 13
Cited 3

Analysis of the Effects of Delays on Load Sharing

IEEE Transactions on Computers
The working set model for program behavior

Communications of the ACM
On paging with locality of reference

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Markov Paging

SIAM Journal on Computing
Resource Allocation in the Grid Using Reinforcement Learning

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 3
The Supercomputer Industry in Light of the Top500 Data

Computing in Science and Engineering
Utility Functions in Autonomic Systems

ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Is 99% utilization of a supercomputer a good thing?

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Online resource allocation using decompositional reinforcement learning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Adaptive load balancing: a study in multi-agent learning

Journal of Artificial Intelligence Research
Improvement of systems management policies using hybrid reinforcement learning

ECML'06 Proceedings of the 17th European conference on Machine Learning

Reinforcement learning based resource allocation in business process management

Data & Knowledge Engineering
URL: A unified reinforcement learning approach for autonomic cloud management

Journal of Parallel and Distributed Computing
Distributed dynamic data driven prediction based on reinforcement learning approach

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Computer grids are complex, heterogeneous, and dynamic systems, whose behavior is governed by hundreds of manually-tuned parameters. As the complexity of these systems grows, automating the procedure of parameter tuning becomes indispensable. In this paper, we consider the problem of auto-tuning server capacity, i.e. the number of jobs a server runs in parallel. We present three different reinforcement learning algorithms, which generate a dynamic policy by changing the number of concurrent running jobs according to the job types and machine state. The algorithms outperform manually-tuned policies for the entire range of checked workloads, with average throughput improvement greater than 20%. On multi-core servers, the average throughput improvement is approximately 40%, which hints at the enormous improvement potential of such a tuning mechanism with the gradual transition to multi-core machines.