Improvement of systems management policies using hybrid reinforcement learning

Authors:
Gerald Tesauro;Nicholas K. Jong;Rajarshi Das;Mohamed N. Bennani
Affiliations:
IBM TJ Watson Research Center, Hawthorne, NY;Dept. of Computer Sciences, Univ. of Texas, Austin, TX;IBM TJ Watson Research Center, Hawthorne, NY;Dept. of Computer Science, George Mason Univ., Fairfax, VA
Venue:
ECML'06 Proceedings of the 17th European conference on Machine Learning
Year:
2006

Citing 7
Cited 1

Internet traffic: periodicity, tail behavior, and performance implications

System performance evaluation
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
A Reinforcement Learning Framework for Dynamic Resource Allocation: First Results.

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Exploration and apprenticeship learning in reinforcement learning

ICML '05 Proceedings of the 22nd international conference on Machine learning
A Hybrid Reinforcement Learning Approach to Autonomic Resource Allocation

ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
Online resource allocation using decompositional reinforcement learning

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Accelerating reinforcement learning through implicit imitation

Journal of Artificial Intelligence Research

Workstation capacity tuning using reinforcement learning

Proceedings of the 2007 ACM/IEEE conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement Learning (RL) holds particular promise in an emerging application domain of performance management of computing systems. In recent work, online RL yielded effective server allocation policies in a prototype Data Center, without explicit system models or built-in domain knowledge. This paper presents a substantially improved and more practical “hybrid” approach, in which RL trains offline on data collected while a queuing-theoretic policy controls the system. This approach avoids potentially poor performance in live online training. Additionally we use nonlinear function approximators instead of tabular value functions; this greatly improves scalability, and surprisingly, eliminated the need for exploratory actions. In experiments using both open-loop and closed-loop traffic as well as large switching delays, our results show significant performance improvement over state-of-art queuing model policies.