A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Authors:
David Choi;Benjamin Roy
Affiliations:
Lincoln Laboratory, Massachusetts Institue of Technology, Lexington, USA 02420-9108;Departments of Management Science and Engineering and Electrical Engineering, Stanford University, Stanford, USA 94305
Venue:
Discrete Event Dynamic Systems
Year:
2006

Citing 18
Cited 9

Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
The Convergence of TD(λ) for General λ

Machine Learning
Temporal difference learning and TD-Gammon

Communications of the ACM
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
On the worst-case analysis of temporal-difference learning algorithms

Machine Learning - Special issue on reinforcement learning
Mean-field theory for batched TD (&lgr;)

Neural Computation
On the existence of fixed points for approximate value iteration and temporal-difference learning

Journal of Optimization Theory and Applications
On the Convergence of Temporal-Difference Learning with Linear Function Approximation

Machine Learning
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal Difference Learning

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Relative Loss Bounds for Temporal-Difference Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Learning and value function approximation in complex decision processes

Learning and value function approximation in complex decision processes
Least-squares policy iteration

The Journal of Machine Learning Research
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Performance Loss Bounds for Approximate Value Iteration with State Aggregation

Mathematics of Operations Research
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Proceedings of the 24th international conference on Machine learning
A New Learning Algorithm for Optimal Stopping

Discrete Event Dynamic Systems
Projected equation methods for approximate solution of large linear systems

Journal of Computational and Applied Mathematics
On Regression-Based Stopping Times

Discrete Event Dynamic Systems
Error Bounds for Approximations from Projected Linear Equations

Mathematics of Operations Research
Kalman temporal differences

Journal of Artificial Intelligence Research
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Mathematics of Operations Research
Recursive least-squares learning with eligibility traces

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The traditional Kalman filter can be viewed as a recursive stochastic algorithm that approximates an unknown function via a linear combination of prespecified basis functions given a sequence of noisy samples. In this paper, we generalize the algorithm to one that approximates the fixed point of an operator that is known to be a Euclidean norm contraction. Instead of noisy samples of the desired fixed point, the algorithm updates parameters based on noisy samples of functions generated by application of the operator, in the spirit of Robbins---Monro stochastic approximation. The algorithm is motivated by temporal-difference learning, and our developments lead to a possibly more efficient variant of temporal-difference learning. We establish convergence of the algorithm and explore efficiency gains through computational experiments involving optimal stopping and queueing problems.