Letters: On the bias of batch Bellman residual minimisation

Authors:
Daniel Schneegass
Affiliations:
Siemens AG, Corporate Technology, Information and Communications, Learning Systems, Otto-Hahn-Ring 6, D-81739 Munich, Germany and University of Luebeck, Institute for Neuro- and Bioinformatics, Ra ...
Venue:
Neurocomputing
Year:
2009

Citing 4
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Least-squares policy iteration

The Journal of Machine Learning Research
Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path

COLT'06 Proceedings of the 19th annual conference on Learning Theory

Quantified Score

Hi-index	0.01

Visualization

Abstract

This letter addresses the problem of Bellman residual minimisation in reinforcement learning for the model-free batch case. We prove the simple, but not necessarily obvious result, that no unbiased estimate of the Bellman residual exists for a single trajectory of observations. We further pick up the recent suggestion of Antos et al. [Learning near-optimal policies with Bellman-residual minimisation based fitted policy iteration and a single sample path, in: COLT, 2006, pp. 574-588] for approximative Bellman residual minimisation and discuss its properties concerning consistency, biasedness, and optimality. We finally give a suggestion to improve the optimality.