Regularized Fitted Q-Iteration: Application to Planning

  • Authors:
  • Amir Massoud Farahmand;Mohammad Ghavamzadeh;Csaba Szepesvári;Shie Mannor

  • Affiliations:
  • Department of Computing Science, University of Alberta, Edmonton, Canada AB T6G 2E8;Department of Computing Science, University of Alberta, Edmonton, Canada AB T6G 2E8;Department of Computing Science, University of Alberta, Edmonton, Canada AB T6G 2E8;Department of Electrical & Computer Engineering, McGill University, Montreal, Canada QC H3A 2A7

  • Venue:
  • Recent Advances in Reinforcement Learning
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider planning in a Markovian decision problem, i.e., the problem of finding a good policy given access to a generative model of the environment. We propose to use fitted Q-iteration with penalized (or regularized) least-squares regression as the regression subroutine to address the problem of controlling model-complexity. The algorithm is presented in detail for the case when the function space is a reproducing-kernel Hilbert space underlying a user-chosen kernel function. We derive bounds on the quality of the solution and argue that data-dependent penalties can lead to almost optimal performance. A simple example is used to illustrate the benefits of using a penalized procedure.