Variance-penalized Markov decision processes
Mathematics of Operations Research
Mailing decisions in the catalog sales industry
Management Science
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Near-Optimal Reinforcement Learning in Polynomial Time
Machine Learning
A Bayesian Framework for Reinforcement Learning
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Reinforcement learning with Gaussian processes
ICML '05 Proceedings of the 22nd international conference on Machine learning
Optimal Online Learning Procedures for Model-Free Policy Evaluation
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
A variance analysis for POMDP policy evaluation
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Non-deterministic policies in Markovian decision processes
Journal of Artificial Intelligence Research
The Journal of Machine Learning Research
Hi-index | 0.00 |
We consider the bias and variance of value function estimation that are caused by using an empirical model instead of the true model. We analyze these bias and variance for Markov processes from a classical (frequentist) statistical point of view, and in a Bayesian setting. Using a second order approximation, we provide explicit expressions for the bias and variance in terms of the transition counts and the reward statistics. We present supporting experiments with artificial Markov chains and with a large transactional database provided by a mail-order catalog firm.