Bias and variance in value function estimation

Authors:
Shie Mannor;Duncan Simester;Peng Sun;John N. Tsitsiklis
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA;Duke University, Durham, NC;Massachusetts Institute of Technology, Cambridge, MA
Venue:
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Year:
2004

Citing 6
Cited 6

Variance-penalized Markov decision processes

Mathematics of Operations Research
Mailing decisions in the catalog sales industry

Management Science
Bayesian Q-learning

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Optimal Mailing of Catalogs: a New Methodology Using Estimable Structural Dynamic Programming Models

Management Science
Near-Optimal Reinforcement Learning in Polynomial Time

Machine Learning
A Bayesian Framework for Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning

Reinforcement learning with Gaussian processes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Optimal Online Learning Procedures for Model-Free Policy Evaluation

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Active imitation learning

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
A variance analysis for POMDP policy evaluation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Non-deterministic policies in Markovian decision processes

Journal of Artificial Intelligence Research
Generalized TD Learning

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the bias and variance of value function estimation that are caused by using an empirical model instead of the true model. We analyze these bias and variance for Markov processes from a classical (frequentist) statistical point of view, and in a Bayesian setting. Using a second order approximation, we provide explicit expressions for the bias and variance in terms of the transition counts and the reward statistics. We present supporting experiments with artificial Markov chains and with a large transactional database provided by a mail-order catalog firm.