On the Asymptotic Behaviour of a Constant Stepsize Temporal-Difference Learning Algorithm

Authors:
Vladislav Tadic
Affiliations:
-
Venue:
EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
Year:
1999

Citing 10
Cited 0

Stochastic systems: estimation, identification and adaptive control

Stochastic systems: estimation, identification and adaptive control
Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
The Convergence of TD(λ) for General λ

Machine Learning
TD(λ) Converges with Probability 1

Machine Learning
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
On the convergence of stochastic iterative dynamic programming algorithms

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The mean-square asymptotic behavior of constant stepsize temporal-difference algorithms is analyzed in this paper. The analysis is carried out for the case of a linear (cost-to-go) function approximation and for the case of Markov chains with an uncountable state space. An asymptotic upper bound for the mean-square deviation of the algorithm iterations from the optimal value of the parameter of the (cost-to-go) function approximator achievable by temporal-difference learning is determined as a function of stepsize.