Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

  • Authors:
  • Chee Wee Phua;Robert Fitch

  • Affiliations:
  • University of New South Wales, Sydney NSW, Australia;University of New South Wales, Sydney NSW, Australia

  • Venue:
  • Proceedings of the 24th international conference on Machine learning
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reinforcement learning algorithms can become unstable when combined with linear function approximation. Algorithms that minimize the mean-square Bellman error are guaranteed to converge, but often do so slowly or are computationally expensive. In this paper, we propose to improve the convergence speed of piecewise linear function approximation by tracking the dynamics of the value function with the Kalman filter using a random-walk model. We cast this as a general framework in which we implement the TD, Q-Learning and MAXQ algorithms for different domains, and report empirical results demonstrating improved learning speed over previous methods.