Finite time bounds for sampling based fitted value iteration

  • Authors:
  • Csaba Szepesvári;Rémi Munos

  • Affiliations:
  • Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary;Centre de Mathématiques Appliquées, Ecole Polytechnique, Palaiseau Cedex, France

  • Venue:
  • ICML '05 Proceedings of the 22nd international conference on Machine learning
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we consider sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a generative model of the transition probabilities and rewards is available. At each step the image of the current estimate of the optimal value function under a Monte-Carlo approximation to the Bellman-operator is projected onto some function space. PAC-style bounds on the weighted Lp-norm approximation error are obtained as a function of the covering number and the approximation power of the function space, the iteration number and the sample size.