Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Analytical Mean Squared Error Curves for Temporal DifferenceLearning
Machine Learning
Finite-sample convergence rates for Q-learning and indirect algorithms
Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Investigating the Maximum Likelihood Alternative to TD(lambda)
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Bias-Variance Error Bounds for Temporal Difference Updates
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Monte Carlo matrix inversion policy evaluation
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Hi-index | 0.00 |
With the advent of Kearns & Singh's (2000) rigorous upper bound on the error of temporal difference estimators, we derive the first rigorous error bound for the maximum likelihood policy evaluation method as well as deriving a Monte Carlo matrix inversion policy evaluation error bound We provide, the first direct comparison between the error bounds of the maximum likelihood (ML), Monte Carlo matrix inversion (MCMI) and temporal difference (TD) estimation methods for policy evaluation We use these bounds to confirm generally held notions of the superior accuracy of the model-based estimation methods of ML and MCMI over the model-free method of TD With our error bounds, we are also able to specify parameters and conditions that affect each method's estimation accuracy.