Technical Update: Least-Squares Temporal Difference Learning

Authors:
Justin A. Boyan
Affiliations:
ITA Software, 141 Portland Street, Cambridge, MA 02139, USA. jab@itasoftware.com (http://www.boyan.com/justin/)
Venue:
Machine Learning
Year:
2002

Citing 12
Cited 42

Integrated architecture for learning, planning, and reacting based on approximating dynamic programming

Proceedings of the seventh international conference (1990) on Machine learning
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Reinforcement learning for robots using neural networks

Reinforcement learning for robots using neural networks
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
TD-Gammon, a self-teaching backgammon program, achieves master-level play

Neural Computation
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Learning evaluation functions for global optimization and Boolean satisfiability

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Nonparametric model-based reinforcement learning

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Learning evaluation functions for global optimization

Learning evaluation functions for global optimization

Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Discrete Event Dynamic Systems
Least-squares policy iteration

The Journal of Machine Learning Research
TD(λ) networks: temporal-difference networks with eligibility traces

ICML '05 Proceedings of the 22nd international conference on Machine learning
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Discrete Event Dynamic Systems
Automatic basis function construction for approximate dynamic programming and reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Performance Loss Bounds for Approximate Value Iteration with State Aggregation

Mathematics of Operations Research
Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation

Proceedings of the 24th international conference on Machine learning
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Artificial Intelligence
A Kernel-Based Reinforcement Learning Approach to Dynamic Behavior Modeling of Intrusion Detection

ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advances in Neural Networks
Projected equation methods for approximate solution of large linear systems

Journal of Computational and Applied Mathematics
Regularization and feature selection in least-squares temporal difference learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Fast gradient-descent methods for temporal-difference learning with linear function approximation

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Online exploration in least-squares policy iteration

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Reordering Sparsification of Kernel Machines in Approximate Policy Iteration

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Reinforcement Learning Control of a Real Mobile Robot Using Approximate Policy Iteration

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part III
Metastable Walking Machines

International Journal of Robotics Research
Incremental least-squares temporal difference learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Optimal Online Learning Procedures for Model-Free Policy Evaluation

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning

Neural Computation
Sequential anomaly detection based on temporal-difference learning: Principles, models and case studies

Applied Soft Computing
Exploring compact reinforcement-learning representations with linear regression

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Linear options

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Reinforcement learning of competitive and cooperative skills in soccer agents

Applied Soft Computing
Kalman temporal differences

Journal of Artificial Intelligence Research
Generalized TD Learning

The Journal of Machine Learning Research
A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning

ICNC'06 Proceedings of the Second international conference on Advances in Natural Computation - Volume Part I
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Mathematics of Operations Research
Actor-Critic algorithm based on incremental least-squares temporal difference with eligibility trace

ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligence
Letter to the editor: Asymptotic analysis of value prediction by well-specified and misspecified models

Neural Networks
ℓ1-Penalized projected bellman residual

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Recursive least-squares learning with eligibility traces

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Unified inter and intra options learning using policy gradient methods

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
A novel feature sparsification method for kernel-based approximate policy iteration

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
A rapid sparsification method for kernel machines in approximate policy iteration

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
An online kernel-based clustering approach for value function approximation

SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
Adaptive reservoir computing through evolution and learning

Neurocomputing
Using approximate dynamic programming to optimize admission control in cloud computing environment

Proceedings of the Winter Simulation Conference
Identifying effective policies in approximate dynamic programming: beyond regression

Proceedings of the Winter Simulation Conference
An efficient L2-norm regularized least-squares temporal difference learning algorithm

Knowledge-Based Systems
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal
Policy oscillation is overshooting

Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

TD.λ/ is a popular family of algorithms for approximate policy evaluation in large MDPs. TD.λ/ works by incrementally updating the value function after each observed transition. It has two major drawbacks: it may make inefficient use of data, and it requires the user to manually tune a stepsize schedule for good performance. For the case of linear value function approximations and λ = 0, the Least-Squares TD (LSTD) algorithm of Bradtke and Barto (1996, Machine learning, 22:1–3, 33–57) eliminates all stepsize parameters and improves data efficiency.This paper updates Bradtke and Barto's work in three significant ways. First, it presents a simpler derivation of the LSTD algorithm. Second, it generalizes from λ = 0 to arbitrary values of λ; at the extreme of λ = 1, the resulting new algorithm is shown to be a practical, incremental formulation of supervised linear regression. Third, it presents a novel and intuitive interpretation of LSTD as a model-based reinforcement learning technique.