Stochastic systems: estimation, identification and adaptive control
Stochastic systems: estimation, identification and adaptive control
Adaptive algorithms and stochastic approximations
Adaptive algorithms and stochastic approximations
Identification and stochastic adaptive control
Identification and stochastic adaptive control
Stochastic approximation and optimization of random systems
Stochastic approximation and optimization of random systems
The Convergence of TD(λ) for General λ
Machine Learning
Adaptive signal processing algorithms: stability and performance
Adaptive signal processing algorithms: stability and performance
TD(λ) Converges with Probability 1
Machine Learning
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Least Squares Policy Evaluation Algorithms with Linear Function Approximation
Discrete Event Dynamic Systems
A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning
Discrete Event Dynamic Systems
Automatic basis function construction for approximate dynamic programming and reinforcement learning
ICML '06 Proceedings of the 23rd international conference on Machine learning
Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
Artificial Intelligence
An analysis of reinforcement learning with function approximation
Proceedings of the 25th international conference on Machine learning
Preconditioned temporal difference learning
Proceedings of the 25th international conference on Machine learning
Natural actor-critic algorithms
Automatica (Journal of IFAC)
A sparse kernel-based least-squares temporal difference algorithm for reinforcement learning
ICNC'06 Proceedings of the Second international conference on Advances in Natural Computation - Volume Part I
Hi-index | 0.00 |
The asymptotic properties of temporal-difference learning algorithms with linear function approximation are analyzed in this paper. The analysis is carried out in the context of the approximation of a discounted cost-to-go function associated with an uncontrolled Markov chain with an uncountable finite-dimensional state-space. Under mild conditions, the almost sure convergence of temporal-difference learning algorithms with linear function approximation is established and an upper bound for their asymptotic approximation error is determined. The obtained results are a generalization and extension of the existing results related to the asymptotic behavior of temporal-difference learning. Moreover, they cover cases to which the existing results cannot be applied, while the adopted assumptions seem to be the weakest possible under which the almost sure convergence of temporal-difference learning algorithms is still possible to be demonstrated.