The complexity of dynamic programming
Journal of Complexity
Sphere packing numbers for subsets of the Boolean n-cube with bounded Vapnik-Chervonenkis dimension
Journal of Combinatorial Theory Series A
Temporal difference learning and TD-Gammon
Communications of the ACM
Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Learning in Neural Networks: Theoretical Foundations
Learning in Neural Networks: Theoretical Foundations
Neuro-Dynamic Programming
Stochastic Optimal Control: The Discrete-Time Case
Stochastic Optimal Control: The Discrete-Time Case
Kernel-Based Reinforcement Learning
Machine Learning
Approximately Optimal Approximate Reinforcement Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
PEGASUS: A policy search method for large MDPs and POMDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Covering number bounds of certain regularized linear function classes
The Journal of Machine Learning Research
Least-squares policy iteration
The Journal of Machine Learning Research
Interpolation-based Q-learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
New simulation methodology for finance: duality theory and simulation in financial engineering
Proceedings of the 35th conference on Winter simulation: driving innovation
Tree-Based Batch Mode Reinforcement Learning
The Journal of Machine Learning Research
A Generalization Error for Q-Learning
The Journal of Machine Learning Research
Finite time bounds for sampling based fitted value iteration
ICML '05 Proceedings of the 22nd international conference on Machine learning
Error bounds for approximate value iteration
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
A reinforcement learning approach to job-shop scheduling
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
COLT'06 Proceedings of the 19th annual conference on Learning Theory
ECML'05 Proceedings of the 16th European conference on Machine Learning
Efficient agnostic learning of neural networks with bounded fan-in
IEEE Transactions on Information Theory - Part 2
Regression methods for pricing complex American-style options
IEEE Transactions on Neural Networks
Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Regularized Fitted Q-Iteration: Application to Planning
Recent Advances in Reinforcement Learning
Approximate dynamic programming with a fuzzy parameterization
Automatica (Journal of IFAC)
Performance bounds for λ policy iteration and application to the game of Tetris
The Journal of Machine Learning Research
The Journal of Machine Learning Research
Hi-index | 0.00 |
In this paper we develop a theoretical analysis of the performance of sampling-based fitted value iteration (FVI) to solve infinite state-space, discounted-reward Markovian decision processes (MDPs) under the assumption that a generative model of the environment is available. Our main results come in the form of finite-time bounds on the performance of two versions of sampling-based FVI. The convergence rate results obtained allow us to show that both versions of FVI are well behaving in the sense that by using a sufficiently large number of samples for a large class of MDPs, arbitrary good performance can be achieved with high probability. An important feature of our proof technique is that it permits the study of weighted Lp-norm performance bounds. As a result, our technique applies to a large class of function-approximation methods (e.g., neural networks, adaptive regression trees, kernel machines, locally weighted learning), and our bounds scale well with the effective horizon of the MDP. The bounds show a dependence on the stochastic stability properties of the MDP: they scale with the discounted-average concentrability of the future-state distributions. They also depend on a new measure of the approximation power of the function space, the inherent Bellman residual, which reflects how well the function space is "aligned" with the dynamics and rewards of the MDP. The conditions of the main result, as well as the concepts introduced in the analysis, are extensively discussed and compared to previous theoretical results. Numerical experiments are used to substantiate the theoretical findings.