TD(λ) Converges with Probability 1

Authors:
Peter Dayan;Terrence J. Sejnowski
Affiliations:
CNL, The Salk Institute, P.O. Box 85800, San Diego, CA 92186-5800. dayan@helmholtz.sdsc.edu;CNL, The Salk Institute, P.O. Box 85800, San Diego, CA 92186-5800. tsejnowski@uscd.edu
Venue:
Machine Learning
Year:
1994

Citing 0
Cited 39

Predictive Hebbian learning

COLT '95 Proceedings of the eighth annual conference on Computational learning theory
Mean-field theory for batched TD (&lgr;)

Neural Computation
Convergence analysis of temporal-difference learning algorithms with linear function approximation

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
Toward a Model of Intelligence as an Economy of Agents

Machine Learning
Analytical Mean Squared Error Curves for Temporal DifferenceLearning

Machine Learning
Convergence Results for Single-Step On-PolicyReinforcement-Learning Algorithms

Machine Learning
On the Convergence of Temporal-Difference Learning with Linear Function Approximation

Machine Learning
Reinforcement learning for fuzzy agents: application to a pighouse environment control

New learning paradigms in soft computing
TD Models of reward predictive responses in dopamine neurons

Neural Networks - Computational models of neuromodulation
Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Discrete Event Dynamic Systems
On the Asymptotic Behaviour of a Constant Stepsize Temporal-Difference Learning Algorithm

EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Cortico-hippocampal maps and navigation strategies in robots and rodents

ICSAB Proceedings of the seventh international conference on simulation of adaptive behavior on From animals to animats
Isotropic sequence order learning

Neural Computation
A review of machine learning

The Knowledge Engineering Review
Temporal Sequence Learning, Prediction, and Control: A Review of Different Models and Their Relation to Biological Mechanisms

Neural Computation
Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

Machine Learning
Declarative Optimization-Based Drama Management in Interactive Fiction

IEEE Computer Graphics and Applications
Strongly improved stability and faster convergence of temporal sequence learning by using input correlations only

Neural Computation
Strongly improved stability and faster convergence of temporal sequence learning by using input correlations only

Neural Computation
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Artificial Intelligence
Experimental analysis of eligibility traces strategies in temporal difference learning

International Journal of Knowledge Engineering and Soft Data Paradigms
Simulation and reinforcement learning with soccer agents

Multiagent and Grid Systems - Innovations in intelligent agent technology
A spiking neural network model of an actor-critic learning agent

Neural Computation
Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Theoretical advances of intelligent paradigms
Reinforcement distribution in fuzzy Q-learning

Fuzzy Sets and Systems
Efficient reinforcement learning using recursive least-squares methods

Journal of Artificial Intelligence Research
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Truncating temporal differences: on the efficient implementation of TD (λ) for reinforcement learning

Journal of Artificial Intelligence Research
Dynamics of temporal difference learning

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
On the asymptotic equivalence between differential Hebbian and temporal difference learning

Neural Computation
Adaptive state space partitioning for reinforcement learning

Engineering Applications of Artificial Intelligence
Counter example for Q-bucket-brigade under prediction problem

IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
Reinforcement learning of competitive skills with soccer agents

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
Convergence analysis on approximate reinforcement learning

KSEM'07 Proceedings of the 2nd international conference on Knowledge science, engineering and management
Reinforcement learning of competitive and cooperative skills in soccer agents

Applied Soft Computing
A distributed reinforcement learning approach for solving optimization problems

CIT'11 Proceedings of the 5th WSEAS international conference on Communications and information technology
Integrating a partial model into model free reinforcement learning

The Journal of Machine Learning Research
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal
Adaptive function approximation in reinforcement learning with an interpolating growing neural gas

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The methods of temporal differences (Samuel, 1959; Sutton, 1984, 1988) allow an agent to learn accurate predictions of stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future.Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as larger samples are taken, and Dayan (1992) extended his proof to the general case. This article proves the stronger result than the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.