Dynamics of temporal difference learning

Authors:
Andreas Wendemuth
Affiliations:
Otto-von-Guericke-University, Magdeburg, Germany and Cognitive Systems Group
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 5
Cited 0

The Convergence of TD(λ) for General λ

Machine Learning
The asymptotic spectra of banded Toeplitz and quasi-Toeplitz matrices

SIAM Journal on Scientific Computing
TD(λ) Converges with Probability 1

Machine Learning
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In behavioural sciences, the problem that a sequence of stimuli is followed by a sequence of rewards r(t) is considered. The subject is to learn the full sequence of rewards from the stimuli, where the prediction is modelled by the Sutton-Barto rule. In a sequence of n trials, this prediction rule is learned iteratively by temporal difference learning. We present a closed formula of the prediction of rewards at trial time t within trial n. From that formula, we show directly that for n → ∞ the predictions converge to the real rewards. In this approach, a new quality of correlation type Toeplitz matrices is proven. We give learning rates which optimally speed up the learning process.