Regularized fitted Q-iteration for planning in continuous-space Markovian decision problems

Authors:
Amir Massoud Farahmand;Mohammad Ghavamzadeh;Csaba Szepesvári;Shie Mannor
Affiliations:
Dept. of Computing Science, University of Alberta, Edmonton, AB, Canada;Dept. of Computing Science, University of Alberta, Edmonton, AB, Canada;Dept. of Computing Science, University of Alberta, Edmonton, AB, Canada;Dept. of Electrical & Computer Eng., McGill University, Montreal, QC, Canada
Venue:
ACC'09 Proceedings of the 2009 conference on American Control Conference
Year:
2009

Citing 9
Cited 2

Stochastic Optimal Control: The Discrete-Time Case

Stochastic Optimal Control: The Discrete-Time Case
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Analyzing feature generation for value-function approximation

Proceedings of the 24th international conference on Machine learning
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Empirical Bernstein stopping

Proceedings of the 25th international conference on Machine learning
Control Techniques for Complex Networks

Control Techniques for Complex Networks
Least Squares SVM for Least Squares TD Learning

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Capacity of reproducing kernel spaces in learning theory

IEEE Transactions on Information Theory

Approximate dynamic programming with a fuzzy parameterization

Automatica (Journal of IFAC)
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reinforcement learning with linear and non-linear function approximation has been studied extensively in the last decade. However, as opposed to other fields of machine learning such as supervised learning, the effect of finite sample has not been thoroughly addressed within the reinforcement learning framework. In this paper we propose to use L2 regularization to control the complexity of the value function in reinforcement learning and planning problems. We consider the Regularized Fitted Q-Iteration algorithm and provide generalization bounds that account for small sample sizes. Finally, a realistic visual-servoing problem is used to illustrate the benefits of using the regularization procedure.