Discounted MDP's: distribution functions and exponential utility maximization
SIAM Journal on Control and Optimization
Mean, variance, and probabilistic criteria in finite Markov decision processes: a review
Journal of Optimization Theory and Applications
Variance-penalized Markov decision processes
Mathematics of Operations Research
Markov decision problems and state-action frequencies
SIAM Journal on Control and Optimization
Variability sensitive Markov decision processes
Mathematics of Operations Research
Mathematics of Operations Research
Maximal mean/standard deviation ratio in an undiscounted MDP
Operations Research Letters
A note on maximal mean/standard deviation ratio in an undiscounted MDP
Operations Research Letters
Hi-index | 0.00 |
In this paper we consider the optimization problem of the variance of the reward for the nonstationary average Markov decision processes (MDP, for short). Examples in this paper show that there are mistakes in proofs of main theorems in two papers, Kurano [(J. Math. Anal. Appl. 123 (1987) 572)] and Guo [(Math. Meth. Oper. Res. 49 (1999) 87-96)] which investigated the optimization problems of the variance of the sun of costs for the average MDP. We propose a variance criterion which is different from that investigated by Kurano (1987) and Guo (1999) and we prove that there exists a Markov policy which is @e-strong variance optimal policy for any @e0 under some appropriate conditions.