Artificial Intelligence Review - Special issue on lazy learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Approximately Optimal Approximate Reinforcement Learning
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Policy Iteration for Factored MDPs
UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Least-squares policy iteration
The Journal of Machine Learning Research
The Linear Programming Approach to Approximate Dynamic Programming
Operations Research
A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way
A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way
Max-norm projections for factored MDPs
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Finite-Time Bounds for Fitted Value Iteration
The Journal of Machine Learning Research
Recursive least-squares learning with eligibility traces
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
The Journal of Machine Learning Research
Hi-index | 0.00 |
Approximate Value Iteration (AVI) is an method for solving a Markov Decision Problem by making successive calls to a supervised learning (SL) algorithm. Sequence of value representations Vn are processed iteratively by Vn+1 = ATVn where T is the Bellman operator and A an approximation operator. Bounds on the error between the performance of the policies induced by the algorithm and the optimal policy are given as a function of weighted Lp-norms (p ≥ 1) of the approximation errors. The results extend usual analysis in L∞-norm, and allow to relate the performance of AVI to the approximation power (usually expressed in Lp-norm, for p = 1 or 2) of the SL algorithm. We illustrate the tightness of these bounds on an optimal replacement problem.