Estimating the value of a discounted reward process

Authors:
Moshe Haviv;Martin L Puterman
Affiliations:
Department of Econometrics, The University of Sydney, Sydney, NSW 2006, Australia;Faculty of Commerce and Business Administration, University of British Columbia, Vancouver, Canada
Venue:
Operations Research Letters
Year:
1992

Citing 3
Cited 0

A guide to simulation (2nd ed.)

A guide to simulation (2nd ed.)
Simulating discounted costs

Management Science
Finite State Markovian Decision Processes

Finite State Markovian Decision Processes

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper provides a differential equation which relates the expected total discounted reward of a reward process to the expected total undiscounted reward of a process which terminates at a negative binomial stopping time. The solution of this equation provides the basis for unbiased estimators of the expected total discounted reward and its derivative with respect to the discount rate. We compare this estimator to other estimators and discuss when it might be more efficient. When rewards are positive we show that the estimator is monotone in the sampled variate.