On the Asymptotic Behaviour of a Constant Stepsize Temporal-Difference Learning Algorithm

  • Authors:
  • Vladislav Tadic

  • Affiliations:
  • -

  • Venue:
  • EuroCOLT '99 Proceedings of the 4th European Conference on Computational Learning Theory
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

The mean-square asymptotic behavior of constant stepsize temporal-difference algorithms is analyzed in this paper. The analysis is carried out for the case of a linear (cost-to-go) function approximation and for the case of Markov chains with an uncountable state space. An asymptotic upper bound for the mean-square deviation of the algorithm iterations from the optimal value of the parameter of the (cost-to-go) function approximator achievable by temporal-difference learning is determined as a function of stepsize.