Convergent activation dynamics in continuous time networks
Neural Networks
The Convergence of TD(λ) for General λ
Machine Learning
Linear least-squares algorithms for temporal difference learning
Machine Learning - Special issue on reinforcement learning
Stochastic approximation with two time scales
Systems & Control Letters
Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning
Artificial Intelligence
The O.D. E. Method for Convergence of Stochastic Approximation and Reinforcement Learning
SIAM Journal on Control and Optimization
Technical Update: Least-Squares Temporal Difference Learning
Machine Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Off-Policy Temporal Difference Learning with Function Approximation
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Incremental least-squares temporal difference learning
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Reinforcement learning of local shape in the game of go
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Feature construction for reinforcement learning in hearts
CG'06 Proceedings of the 5th international conference on Computers and games
Adaptive bases for reinforcement learning
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
The Journal of Machine Learning Research
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Q-Learning and Enhanced Policy Iteration in Discounted Dynamic Programming
Mathematics of Operations Research
Gradient based algorithms with loss functions and kernels for improved on-policy control
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
ℓ1-Penalized projected bellman residual
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
MapReduce for parallel reinforcement learning
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Beyond reward: the problem of knowledge and data
ILP'11 Proceedings of the 21st international conference on Inductive Logic Programming
Framework of automatic text summarization using reinforcement learning
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Applying the learning rate adaptation to the matrix factorization based collaborative filtering
Knowledge-Based Systems
Smart exploration in reinforcement learning using absolute temporal difference errors
Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Reinforcement learning algorithms with function approximation: Recent advances and applications
Information Sciences: an International Journal
Multi-timescale nexting in a reinforcement learning robot
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Hi-index | 0.01 |
Sutton, Szepesvári and Maei (2009) recently introduced the first temporal-difference learning algorithm compatible with both linear function approximation and off-policy training, and whose complexity scales only linearly in the size of the function approximator. Although their gradient temporal difference (GTD) algorithm converges reliably, it can be very slow compared to conventional linear TD (on on-policy problems where TD is convergent), calling into question its practical utility. In this paper we introduce two new related algorithms with better convergence rates. The first algorithm, GTD2, is derived and proved convergent just as GTD was, but uses a different objective function and converges significantly faster (but still not as fast as conventional TD). The second new algorithm, linear TD with gradient correction, or TDC, uses the same update rule as conventional TD except for an additional term which is initially zero. In our experiments on small test problems and in a Computer Go application with a million features, the learning rate of this algorithm was comparable to that of conventional TD. This algorithm appears to extend linear TD to off-policy learning with no penalty in performance while only doubling computational requirements.