Feature-based methods for large scale dynamic programming
Machine Learning - Special issue on reinforcement learning
Learning in Neural Networks: Theoretical Foundations
Learning in Neural Networks: Theoretical Foundations
Neuro-Dynamic Programming
Stochastic Optimal Control: The Discrete-Time Case
Stochastic Optimal Control: The Discrete-Time Case
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes
IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
PAC Bounds for Multi-armed Bandit and Markov Decision Processes
COLT '02 Proceedings of the 15th Annual Conference on Computational Learning Theory
Covering number bounds of certain regularized linear function classes
The Journal of Machine Learning Research
Interpolation-based Q-learning
ICML '04 Proceedings of the twenty-first international conference on Machine learning
The Journal of Machine Learning Research
Finite-Time Bounds for Fitted Value Iteration
The Journal of Machine Learning Research
Continuous-state reinforcement learning with fuzzy approximation
ALAMAS'05/ALAMAS'06/ALAMAS'07 Proceedings of the 5th , 6th and 7th European conference on Adaptive and learning agents and multi-agent systems: adaptation and multi-agent learning
COLT'06 Proceedings of the 19th annual conference on Learning Theory
Hi-index | 0.00 |
In this paper we consider sampling based fitted value iteration for discounted, large (possibly infinite) state space, finite action Markovian Decision Problems where only a generative model of the transition probabilities and rewards is available. At each step the image of the current estimate of the optimal value function under a Monte-Carlo approximation to the Bellman-operator is projected onto some function space. PAC-style bounds on the weighted Lp-norm approximation error are obtained as a function of the covering number and the approximation power of the function space, the iteration number and the sample size.