Regularized Fitted Q-Iteration: Application to Planning

Authors:
Amir Massoud Farahmand;Mohammad Ghavamzadeh;Csaba Szepesvári;Shie Mannor
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, Canada AB T6G 2E8;Department of Computing Science, University of Alberta, Edmonton, Canada AB T6G 2E8;Department of Computing Science, University of Alberta, Edmonton, Canada AB T6G 2E8;Department of Electrical & Computer Engineering, McGill University, Montreal, Canada QC H3A 2A7
Venue:
Recent Advances in Reinforcement Learning
Year:
2008

Citing 14
Cited 1

Stochastic Optimal Control: The Discrete-Time Case

Stochastic Optimal Control: The Discrete-Time Case
Kernel-Based Reinforcement Learning

Machine Learning
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Tree-Based Batch Mode Reinforcement Learning

The Journal of Machine Learning Research
Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Analyzing feature generation for value-function approximation

Proceedings of the 24th international conference on Machine learning
Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path

Machine Learning
Finite-Time Bounds for Fitted Value Iteration

The Journal of Machine Learning Research
Least Squares SVM for Least Squares TD Learning

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
A sparse sampling algorithm for near-optimal planning in large Markov decision processes

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Learning bounds for support vector machines with learned kernels

COLT'06 Proceedings of the 19th annual conference on Learning Theory
The kernel recursive least-squares algorithm

IEEE Transactions on Signal Processing
Capacity of reproducing kernel spaces in learning theory

IEEE Transactions on Information Theory
Kernel-Based Least Squares Policy Iteration for Reinforcement Learning

IEEE Transactions on Neural Networks

Dynamic policy programming

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the function space is a reproducing-kernel Hilbert space underlying a user-chosen kernel function. We derive bounds on the quality of the solution and argue that data-dependent penalties can lead to almost optimal performance. A simple example is used to illustrate the benefits of using a penalized procedure.