Error bounds for approximate value iteration

Authors:
Rémi Munos
Affiliations:
Centre de Mathématiques Appliquées, Ecole Polytechnique, Palaiseau Cedex, France
Venue:
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Year:
2005

Citing 10
Cited 3

Locally Weighted Learning

Artificial Intelligence Review - Special issue on lazy learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Approximately Optimal Approximate Reinforcement Learning

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Policy Iteration for Factored MDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Least-squares policy iteration

The Journal of Machine Learning Research
The Linear Programming Approach to Approximate Dynamic Programming

Operations Research
A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way

A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way
Max-norm projections for factored MDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1

Finite-Time Bounds for Fitted Value Iteration

The Journal of Machine Learning Research
Recursive least-squares learning with eligibility traces

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Dynamic policy programming

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Approximate Value Iteration (AVI) is an method for solving a Markov Decision Problem by making successive calls to a supervised learning (SL) algorithm. Sequence of value representations Vn are processed iteratively by Vn+1 = ATVn where T is the Bellman operator and A an approximation operator. Bounds on the error between the performance of the policies induced by the algorithm and the optimal policy are given as a function of weighted Lp-norms (p ≥ 1) of the approximation errors. The results extend usual analysis in L∞-norm, and allow to relate the performance of AVI to the approximation power (usually expressed in Lp-norm, for p = 1 or 2) of the SL algorithm. We illustrate the tightness of these bounds on an optimal replacement problem.