Error bounds in reinforcement learning policy evaluation

Authors:
Fletcher Lu
Affiliations:
University of Waterloo, Waterloo, Ontario, Canada
Venue:
AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Year:
2005

Citing 8
Cited 0

Reinforcement learning with replacing eligibility traces

Machine Learning - Special issue on reinforcement learning
Analytical Mean Squared Error Curves for Temporal DifferenceLearning

Machine Learning
Finite-sample convergence rates for Q-learning and indirect algorithms

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Investigating the Maximum Likelihood Alternative to TD(lambda)

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Bias-Variance Error Bounds for Temporal Difference Updates

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Monte Carlo matrix inversion policy evaluation

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the advent of Kearns & Singh's (2000) rigorous upper bound on the error of temporal difference estimators, we derive the first rigorous error bound for the maximum likelihood policy evaluation method as well as deriving a Monte Carlo matrix inversion policy evaluation error bound We provide, the first direct comparison between the error bounds of the maximum likelihood (ML), Monte Carlo matrix inversion (MCMI) and temporal difference (TD) estimation methods for policy evaluation We use these bounds to confirm generally held notions of the superior accuracy of the model-based estimation methods of ML and MCMI over the model-free method of TD With our error bounds, we are also able to specify parameters and conditions that affect each method's estimation accuracy.