Stochastic systems: estimation, identification and adaptive control
Stochastic systems: estimation, identification and adaptive control
Adaptive algorithms and stochastic approximations
Adaptive algorithms and stochastic approximations
The Convergence of TD(λ) for General λ
Machine Learning
TD(λ) Converges with Probability 1
Machine Learning
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences
Machine Learning
On the convergence of stochastic iterative dynamic programming algorithms
Neural Computation
Hi-index | 0.00 |
The mean-square asymptotic behavior of constant stepsize temporal-difference algorithms is analyzed in this paper. The analysis is carried out for the case of a linear (cost-to-go) function approximation and for the case of Markov chains with an uncountable state space. An asymptotic upper bound for the mean-square deviation of the algorithm iterations from the optimal value of the parameter of the (cost-to-go) function approximator achievable by temporal-difference learning is determined as a function of stepsize.