Model selection in reinforcement learning

Authors:
Amir-Massoud Farahmand;Csaba Szepesvári
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, Canada T6G 2E8;Department of Computing Science, University of Alberta, Edmonton, Canada T6G 2E8
Venue:
Machine Learning
Year:
2011

Citing 0
Cited 2

Model selection in markovian processes

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of model selection in the batch (offline, non-interactive) reinforcement learning setting when the goal is to find an action-value function with the smallest Bellman error among a countable set of candidates functions. We propose a complexity regularization-based model selection algorithm, $\ensuremath{\mbox{\textsc {BErMin}}}$, and prove that it enjoys an oracle-like property: the estimator's error differs from that of an oracle, who selects the candidate with the minimum Bellman error, by only a constant factor and a small remainder term that vanishes at a parametric rate as the number of samples increases. As an application, we consider a problem when the true action-value function belongs to an unknown member of a nested sequence of function spaces. We show that under some additional technical conditions $\ensuremath{\mbox{\textsc {BErMin}}}$ leads to a procedure whose rate of convergence, up to a constant factor, matches that of an oracle who knows which of the nested function spaces the true action-value function belongs to, i.e., the procedure achieves adaptivity.