Bias and Variance Approximation in Value Function Estimates

Authors:
Shie Mannor;Duncan Simester;Peng Sun;John N. Tsitsiklis
Affiliations:
Department of Electrical and Computer Engineering, McGill University, Montreal, Quebec, Canada H3A 2A7;Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;Fuqua School of Business, Duke University, Durham, North Carolina 27708;Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Venue:
Management Science
Year:
2007

Citing 10
Cited 15

Variance-penalized Markov decision processes

Mathematics of Operations Research
Variability sensitive Markov decision processes

Mathematics of Operations Research
Mailing decisions in the catalog sales industry

Management Science
Optimal Mailing of Catalogs: a New Methodology Using Estimable Structural Dynamic Programming Models

Management Science
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
Reinforcement Learning

Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Revenue Management: Research Overview and Prospects

Transportation Science
An Adaptive Dynamic Programming Algorithm for Dynamic Fleet Management, I: Single Period Travel Times

Transportation Science
Dynamic Catalog Mailing Policies

Management Science

Percentile optimization in uncertain Markov decision processes with application to efficient exploration

Proceedings of the 24th international conference on Machine learning
Reinforcement learning in the presence of rare events

Proceedings of the 25th international conference on Machine learning
A semiparametric statistical approach to model-free policy evaluation

Proceedings of the 25th international conference on Machine learning
A variance analysis for POMDP policy evaluation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Robust adaptive Markov decision processes in multi-vehicle applications

ACC'09 Proceedings of the 2009 conference on American Control Conference
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty

Operations Research
Dynamic Allocation of Pharmaceutical Detailing and Sampling for Long-Term Profitability

Marketing Science
Non-deterministic policies in Markovian decision processes

Journal of Artificial Intelligence Research
Generalized TD Learning

The Journal of Machine Learning Research
The Impact of Tariff Structure on Customer Retention, Usage, and Profitability of Access Services

Marketing Science
Distributionally Robust Markov Decision Processes

Mathematics of Operations Research
Performance Guarantees for Empirical Markov Decision Processes with Applications to Multiperiod Inventory Models

Operations Research
Optimal learning of transition probabilities in the two-agent newsvendor problem

Proceedings of the Winter Simulation Conference
Robust Markov Decision Processes

Mathematics of Operations Research
Robust Modified Policy Iteration

INFORMS Journal on Computing

Quantified Score

Hi-index	0.01

Visualization

Abstract

We consider a finite-state, finite-action, infinite-horizon, discounted reward Markov decision process and study the bias and variance in the value function estimates that result from empirical estimates of the model parameters. We provide closed-form approximations for the bias and variance, which can then be used to derive confidence intervals around the value function estimates. We illustrate and validate our findings using a large database describing the transaction and mailing histories for customers of a mail-order catalog firm.