Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Authors:
Chee Wee Phua;Robert Fitch
Affiliations:
University of New South Wales, Sydney NSW, Australia;University of New South Wales, Sydney NSW, Australia
Venue:
Proceedings of the 24th international conference on Machine learning
Year:
2007

Citing 12
Cited 2

Adaptive filter theory

Adaptive filter theory
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Variable Resolution Discretization in Optimal Control

Machine Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Incremental Learning of Linear Model Trees

Machine Learning
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Discrete Event Dynamic Systems
Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches

Optimal State Estimation: Kalman, H Infinity, and Nonlinear Approaches
Incremental least-squares temporal difference learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Hierarchical reinforcement learning with the MAXQ value function decomposition

Journal of Artificial Intelligence Research
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research

Bayesian Reward Filtering

Recent Advances in Reinforcement Learning
Kalman temporal differences

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning algorithms can become unstable when combined with linear function approximation. Algorithms that minimize the mean-square Bellman error are guaranteed to converge, but often do so slowly or are computationally expensive. In this paper, we propose to improve the convergence speed of piecewise linear function approximation by tracking the dynamics of the value function with the Kalman filter using a random-walk model. We cast this as a general framework in which we implement the TD, Q-Learning and MAXQ algorithms for different domains, and report empirical results demonstrating improved learning speed over previous methods.