Optimal Online Learning Procedures for Model-Free Policy Evaluation

  • Authors:
  • Tsuyoshi Ueno;Shin-Ichi Maeda;Motoaki Kawanabe;Shin Ishii

  • Affiliations:
  • Graduate School of Informatics, Kyoto University,;Graduate School of Informatics, Kyoto University,;Fraunhofer FIRST and Berlin Institute of Technology, Germany;Graduate School of Informatics, Kyoto University,

  • Venue:
  • ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this study, we extend the framework of semiparametric statistical inference introduced recently to reinforcement learning [1] to online learning procedures for policy evaluation. This generalization enables us to investigate statistical properties of value function estimators both by batch and online procedures in a unified way in terms of estimating functions. Furthermore, we propose a novel online learning algorithm with optimal estimating functions which achieve the minimum estimation error. Our theoretical developments are confirmed using a simple chain walk problem.