On the Convergence of Temporal-Difference Learning with Linear Function Approximation

Authors:
Vladislav Tadić
Affiliations:
Department of Electrical and Electronic Engineering, The University of Melbourne, Parkville, Victoria 3010, Australia. v.tadic@ee.mu.oz.au
Venue:
Machine Learning
Year:
2001

Citing 11
Cited 9

Stochastic systems: estimation, identification and adaptive control

Stochastic systems: estimation, identification and adaptive control
Adaptive algorithms and stochastic approximations

Adaptive algorithms and stochastic approximations
Identification and stochastic adaptive control

Identification and stochastic adaptive control
Stochastic approximation and optimization of random systems

Stochastic approximation and optimization of random systems
The Convergence of TD(λ) for General λ

Machine Learning
Adaptive signal processing algorithms: stability and performance

Adaptive signal processing algorithms: stability and performance
TD(λ) Converges with Probability 1

Machine Learning
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning

Least Squares Policy Evaluation Algorithms with Linear Function Approximation

Discrete Event Dynamic Systems
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning

Discrete Event Dynamic Systems
Asymptotic analysis of temporal-difference learning algorithms with constant step-sizes

Machine Learning
Automatic basis function construction for approximate dynamic programming and reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning

Artificial Intelligence
An analysis of reinforcement learning with function approximation

Proceedings of the 25th international conference on Machine learning
Preconditioned temporal difference learning

Proceedings of the 25th international conference on Machine learning
Natural actor-critic algorithms

Automatica (Journal of IFAC)
A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning

ICNC'06 Proceedings of the Second international conference on Advances in Natural Computation - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

The asymptotic properties of temporal-difference learning algorithms with linear function approximation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted cost-to-go function associated with an uncontrolled Markov chain with an uncountable finite-dimensional state-space. Under mild conditions, the almost sure convergence of temporal-difference learning algorithms with linear function approximation is established and an upper bound for their asymptotic approximation error is determined. The obtained results are a generalization and extension of the existing results related to the asymptotic behavior of temporal-difference learning. Moreover, they cover cases to which the existing results cannot be applied, while the adopted assumptions seem to be the weakest possible under which the almost sure convergence of temporal-difference learning algorithms is still possible to be demonstrated.