Incremental least-squares temporal difference learning

Authors:
Alborz Geramifard;Michael Bowling;Richard S. Sutton
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada;Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
Venue:
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Year:
2006

Citing 9
Cited 9

Distributed representations

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Reinforcement learning for robots using neural networks

Reinforcement learning for robots using neural networks
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research

Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Proceedings of the 24th international conference on Machine learning
Preconditioned temporal difference learning

Proceedings of the 25th international conference on Machine learning
Toward Approximate Adaptive Learning

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Fast gradient-descent methods for temporal-difference learning with linear function approximation

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Kalman temporal differences

Journal of Artificial Intelligence Research
Generalized TD Learning

The Journal of Machine Learning Research
Adaptive reservoir computing through evolution and learning

Neurocomputing
An efficient L2-norm regularized least-squares temporal difference learning algorithm

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approximate policy evaluation with linear function approximation is a commonly arising problem in reinforcement learning, usually solved using temporal difference (TD) algorithms. In this paper we introduce a new variant of linear TD learning, called incremental least-squares TD learning, or iLSTD. This method is more data efficient than conventional TD algorithms such as TD(0) and is more computationally efficient than non-incremental least-squares TD methods such as LSTD (Bradtke & Barto 1996; Boyan 1999). In particular, we show that the per-time-step complexities of iLSTD and TD(0) are O(n), where n is the number of features, whereas that of LSTD is O(n2). This difference can be decisive in modern applications of reinforcement learning where the use of a large number features has proven to be an effective solution strategy. We present empirical comparisons, using the test problem introduced by Boyan (1999), in which iLSTD converges faster than TD(0) and almost as fast as LSTD.