Improvement of systems management policies using hybrid reinforcement learning

  • Authors:
  • Gerald Tesauro;Nicholas K. Jong;Rajarshi Das;Mohamed N. Bennani

  • Affiliations:
  • IBM TJ Watson Research Center, Hawthorne, NY;Dept. of Computer Sciences, Univ. of Texas, Austin, TX;IBM TJ Watson Research Center, Hawthorne, NY;Dept. of Computer Science, George Mason Univ., Fairfax, VA

  • Venue:
  • ECML'06 Proceedings of the 17th European conference on Machine Learning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reinforcement Learning (RL) holds particular promise in an emerging application domain of performance management of computing systems. In recent work, online RL yielded effective server allocation policies in a prototype Data Center, without explicit system models or built-in domain knowledge. This paper presents a substantially improved and more practical “hybrid” approach, in which RL trains offline on data collected while a queuing-theoretic policy controls the system. This approach avoids potentially poor performance in live online training. Additionally we use nonlinear function approximators instead of tabular value functions; this greatly improves scalability, and surprisingly, eliminated the need for exploratory actions. In experiments using both open-loop and closed-loop traffic as well as large switching delays, our results show significant performance improvement over state-of-art queuing model policies.