Finite-Time Bounds for Fitted Value Iteration

Authors:
Rémi Munos;Csaba Szepesvári
Affiliations:
-;-
Venue:
The Journal of Machine Learning Research
Year:
2008

Citing 30
Cited 5

The complexity of dynamic programming

Journal of Complexity
Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension

Journal of Combinatorial Theory Series A
Temporal difference learning and TD-Gammon

Communications of the ACM
Feature-based methods for large scale dynamic programming

Machine Learning - Special issue on reinforcement learning
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Learning in Neural Networks: Theoretical Foundations

Learning in Neural Networks: Theoretical Foundations
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Stochastic Optimal Control: The Discrete-Time Case

Stochastic Optimal Control: The Discrete-Time Case
Kernel-Based Reinforcement Learning

Machine Learning
Approximately Optimal Approximate Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
PAC Bounds for Multi-armed Bandit and Markov Decision Processes

COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Covering number bounds of certain regularized linear function classes

The Journal of Machine Learning Research
Least-squares policy iteration

The Journal of Machine Learning Research
A Reinforcement Learning Algorithm Based on Policy Iteration for Average Reward: Empirical Results with Yield Management and Convergence Analysis

Machine Learning
Interpolation-based Q-learning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
New simulation methodology for finance: duality theory and simulation in financial engineering

Proceedings of the 35th conference on Winter simulation: driving innovation
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
A Generalization Error for Q-Learning

The Journal of Machine Learning Research
Finite time bounds for sampling based fitted value iteration

ICML '05 Proceedings of the 22nd international conference on Machine learning
Efficient approximate planning in continuous space Markovian Decision Problems

AI Communications
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Error bounds for approximate value iteration

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
A reinforcement learning approach to job-shop scheduling

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Neural fitted q iteration – first experiences with a data efficient neural reinforcement learning method

ECML'05 Proceedings of the 16th European conference on Machine Learning
Efficient agnostic learning of neural networks with bounded fan-in

IEEE Transactions on Information Theory - Part 2
Regression methods for pricing complex American-style options

IEEE Transactions on Neural Networks

Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Regularized Fitted Q-Iteration: Application to Planning

Recent Advances in Reinforcement Learning
Approximate dynamic programming with a fuzzy parameterization

Automatica (Journal of IFAC)
Performance bounds for λ policy iteration and application to the game of Tetris

The Journal of Machine Learning Research
Dynamic policy programming

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of finite-time bounds on the performance of two versions of sampling-based FVI. The convergence rate results obtained allow us to show that both versions of FVI are well behaving in the sense that by using a sufficiently large number of samples for a large class of MDPs, arbitrary good performance can be achieved with high probability. An important feature of our proof technique is that it permits the study of weighted Lp-norm performance bounds. As a result, our technique applies to a large class of function-approximation methods (e.g., neural networks, adaptive regression trees, kernel machines, locally weighted learning), and our bounds scale well with the effective horizon of the MDP. The bounds show a dependence on the stochastic stability properties of the MDP: they scale with the discounted-average concentrability of the future-state distributions. They also depend on a new measure of the approximation power of the function space, the inherent Bellman residual, which reflects how well the function space is "aligned" with the dynamics and rewards of the MDP. The conditions of the main result, as well as the concepts introduced in the analysis, are extensively discussed and compared to previous theoretical results. Numerical experiments are used to substantiate the theoretical findings.