The Convergence of TD(λ) for General λ

Authors:
Peter Dayan
Affiliations:
Centre for Cognitive Science & Department of Physics, University of Edinburgh, EH8 9LW, Scotland. dayan@helmholtz.sdsc.edu
Venue:
Machine Learning
Year:
1992

Citing 0
Cited 36

Efficient reinforcement learning

COLT '94 Proceedings of the seventh annual conference on Computational learning theory
Mean-field theory for batched TD (&lgr;)

Neural Computation
Colearning in Differential Games

Machine Learning
Convergence analysis of temporal-difference learning algorithms with linear function approximation

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Analytical Mean Squared Error Curves for Temporal DifferenceLearning

Machine Learning
Structural Results About On-line Learning Models With and Without Queries

Machine Learning
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
On the Convergence of Temporal-Difference Learning with Linear Function Approximation

Machine Learning
On the Asymptotic Behaviour of a Constant Stepsize Temporal-Difference Learning Algorithm

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Planning, learning and coordination in multiagent decision processes

TARK '96 Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge
Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms

Neural Computation
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Discrete Event Dynamic Systems
Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

Machine Learning
The asymptotic equipartition property in reinforcement learning and its relation to return maximization

Neural Networks
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation
Improving generalization for temporal difference learning: The successor representation

Neural Computation
Experimental analysis of eligibility traces strategies in temporal difference learning

International Journal of Knowledge Engineering and Soft Data Paradigms
A spiking neural network model of an actor-critic learning agent

Neural Computation
Reinforcement distribution in fuzzy Q-learning

Fuzzy Sets and Systems
Reinforcement Learning: A Tutorial Survey and Recent Advances

INFORMS Journal on Computing
2009 Special Issue: Adaptive learning via selectionism and Bayesianism, Part II: The sequential case

Neural Networks
Fast gradient-descent methods for temporal-difference learning with linear function approximation

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Truncating temporal differences: on the efficient implementation of TD (λ) for reinforcement learning

Journal of Artificial Intelligence Research
Dynamics of temporal difference learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Learning to act using real-time dynamic programming

Artificial Intelligence
Reinforcement learning and adaptive dynamic programming for feedback control

IEEE Circuits and Systems Magazine
Adaptive state space partitioning for reinforcement learning

Engineering Applications of Artificial Intelligence
Reinforcement learning of competitive and cooperative skills in soccer agents

Applied Soft Computing
An information-spectrum approach to analysis of return maximization in reinforcement learning

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: theory and algorithms - Volume Part I
An information-theoretic analysis of return maximization in reinforcement learning

Neural Networks
Monte Carlo matrix inversion policy evaluation

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Brief paper: Average cost temporal-difference learning

Automatica (Journal of IFAC)
Towards a Multiple-Lookahead-Levels agent reinforcement-learning technique and its implementation in integrated circuits

The Journal of Supercomputing
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

The method of temporal differences (TD) is one way of making consistent predictions about the future. This paper uses some analysis of Watkins (1989) to extend a convergence theorem due to Sutton (1988) from the case which only uses information from adjacent time steps to that involving information from arbitrary ones.It also considers how this version of TD behaves in the face of linearly dependent representations for states—demonstrating that it still converges, but to a different answer from the least mean squares algorithm. Finally it adapts Watkins' theorem that \cal Q-learning, his closely related prediction and action learning method, converges with probability one, to demonstrate this strong form of convergence for a slightly modified version of TD.