Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

Authors:
Vladislav B. Tadić
Affiliations:
Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield, United Kingdom S1 3JD
Venue:
Machine Learning
Year:
2006

Citing 11
Cited 0

Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
The Convergence of TD(λ) for General λ

Machine Learning
Adaptive signal processing algorithms: stability and performance

Adaptive signal processing algorithms: stability and performance
TD(λ) Converges with Probability 1

Machine Learning
On the Convergence of Temporal-Difference Learning with Linear Function Approximation

Machine Learning
Random Iterative Models

Random Iterative Models
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Discrete Event Dynamic Systems
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Actor-critic algorithms

Actor-critic algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

The mean-square asymptotic behavior of temporal-difference learning algorithms with constant step-sizes and linear function approximation is analyzed in this paper. The analysis is carried out for the case of discounted cost function associated with a Markov chain with a finite dimensional state-space. Under mild conditions, an upper bound for the asymptotic mean-square error of these algorithms is determined as a function of the step-size. Moreover, under the same assumptions, it is also shown that this bound is linear in the step size. The main results of the paper are illustrated with examples related to M/G/1 queues and nonlinear AR models with Markov switching.