Variance-penalized Markov decision processes
Mathematics of Operations Research
Variability sensitive Markov decision processes
Mathematics of Operations Research
Mailing decisions in the catalog sales industry
Management Science
Dynamic Programming and Optimal Control, Two Volume Set
Dynamic Programming and Optimal Control, Two Volume Set
Reinforcement Learning
Neuro-Dynamic Programming
Revenue Management: Research Overview and Prospects
Transportation Science
Dynamic Catalog Mailing Policies
Management Science
Proceedings of the 24th international conference on Machine learning
Reinforcement learning in the presence of rare events
Proceedings of the 25th international conference on Machine learning
A semiparametric statistical approach to model-free policy evaluation
Proceedings of the 25th international conference on Machine learning
A variance analysis for POMDP policy evaluation
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Robust adaptive Markov decision processes in multi-vehicle applications
ACC'09 Proceedings of the 2009 conference on American Control Conference
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty
Operations Research
Non-deterministic policies in Markovian decision processes
Journal of Artificial Intelligence Research
The Journal of Machine Learning Research
Distributionally Robust Markov Decision Processes
Mathematics of Operations Research
Optimal learning of transition probabilities in the two-agent newsvendor problem
Proceedings of the Winter Simulation Conference
Robust Markov Decision Processes
Mathematics of Operations Research
Robust Modified Policy Iteration
INFORMS Journal on Computing
Hi-index | 0.01 |
We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate and validate our findings using a large database describing the transaction and mailing histories for customers of a mail-order catalog firm.