Error bounds in reinforcement learning policy evaluation

  • Authors:
  • Fletcher Lu

  • Affiliations:
  • University of Waterloo, Waterloo, Ontario, Canada

  • Venue:
  • AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the advent of Kearns & Singh's (2000) rigorous upper bound on the error of temporal difference estimators, we derive the first rigorous error bound for the maximum likelihood policy evaluation method as well as deriving a Monte Carlo matrix inversion policy evaluation error bound We provide, the first direct comparison between the error bounds of the maximum likelihood (ML), Monte Carlo matrix inversion (MCMI) and temporal difference (TD) estimation methods for policy evaluation We use these bounds to confirm generally held notions of the superior accuracy of the model-based estimation methods of ML and MCMI over the model-free method of TD With our error bounds, we are also able to specify parameters and conditions that affect each method's estimation accuracy.