Finite time bounds for sampling based fitted value iteration

Authors:
Csaba Szepesvári;Rémi Munos
Affiliations:
Computer and Automation Research Institute of the Hungarian Academy of Sciences, Budapest, Hungary;Centre de Mathématiques Appliquées, Ecole Polytechnique, Palaiseau Cedex, France
Venue:
ICML '05 Proceedings of the 22nd international conference on Machine learning
Year:
2005

Citing 8
Cited 5

Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
Learning in Neural Networks: Theoretical Foundations

Learning in Neural Networks: Theoretical Foundations
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Stochastic Optimal Control: The Discrete-Time Case

Stochastic Optimal Control: The Discrete-Time Case
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
PAC Bounds for Multi-armed Bandit and Markov Decision Processes

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Covering number bounds of certain regularized linear function classes

The Journal of Machine Learning Research
Interpolation-based Q-learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

The Journal of Machine Learning Research
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Finite-Time Bounds for Fitted Value Iteration

The Journal of Machine Learning Research
Continuous-state reinforcement learning with fuzzy approximation

ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we consider sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a generative model of the transition probabilities and rewards is available. At each step the image of the current estimate of the optimal value function under a Monte-Carlo approximation to the Bellman-operator is projected onto some function space. PAC-style bounds on the weighted Lp-norm approximation error are obtained as a function of the covering number and the approximation power of the function space, the iteration number and the sample size.