Fast gradient-descent methods for temporal-difference learning with linear function approximation

Authors:
Richard S. Sutton;Hamid Reza Maei;Doina Precup;Shalabh Bhatnagar;David Silver;Csaba Szepesvári;Eric Wiewiora
Affiliations:
University of Alberta, Edmonton, Canada;University of Alberta, Edmonton, Canada;McGill University, Montreal, Canada;Indian Institute of Science, Bangalore, India;University of Alberta, Edmonton, Canada;University of Alberta, Edmonton, Canada;University of Alberta, Edmonton, Canada
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 13
Cited 13

Convergent activation dynamics in continuous time networks

Neural Networks
The Convergence of TD(λ) for General λ

Machine Learning
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Stochastic approximation with two time scales

Systems & Control Letters
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning

Artificial Intelligence
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

SIAM Journal on Control and Optimization
Technical Update: Least-Squares Temporal Difference Learning

Machine Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Off-Policy Temporal Difference Learning with Function Approximation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Incremental least-squares temporal difference learning

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Reinforcement learning of local shape in the game of go

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Feature construction for reinforcement learning in hearts

CG'06 Proceedings of the 5th international conference on Computers and games

Adaptive bases for reinforcement learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Generalized TD Learning

The Journal of Machine Learning Research
Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming

Mathematics of Operations Research
Gradient based algorithms with loss functions and kernels for improved on-policy control

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
ℓ1-Penalized projected bellman residual

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
MapReduce for parallel reinforcement learning

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Beyond reward: the problem of knowledge and data

ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming
Framework of automatic text summarization using reinforcement learning

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Applying the learning rate adaptation to the matrix factorization based collaborative filtering

Knowledge-Based Systems
Smart exploration in reinforcement learning using absolute temporal difference errors

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal
Multi-timescale nexting in a reinforcement learning robot

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning algorithm compatible with both linear function approximation and off-policy training, and whose complexity scales only linearly in the size of the function approximator. Although their gradient temporal difference (GTD) algorithm converges reliably, it can be very slow compared to conventional linear TD (on on-policy problems where TD is convergent), calling into question its practical utility. In this paper we introduce two new related algorithms with better convergence rates. The first algorithm, GTD2, is derived and proved convergent just as GTD was, but uses a different objective function and converges significantly faster (but still not as fast as conventional TD). The second new algorithm, linear TD with gradient correction, or TDC, uses the same update rule as conventional TD except for an additional term which is initially zero. In our experiments on small test problems and in a Computer Go application with a million features, the learning rate of this algorithm was comparable to that of conventional TD. This algorithm appears to extend linear TD to off-policy learning with no penalty in performance while only doubling computational requirements.