Mathematics of Operations Research
An analysis of stochastic shortest path problems
Mathematics of Operations Research
Discrete-time controlled Markov processes with average cost criterion: a survey
SIAM Journal on Control and Optimization
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Hi-index | 0.00 |
The undiscounted, unichain, finite state Markov decision process with compact action space is studied. We provide a counterexample for a result in Hordijk and Puterman (1987) and give an alternate proof of the convergence of policy iteration under the condition that there exists a state that is recurrent under every stationary policy. The analysis essentially uses a two-term matrix representation for the relative value vectors generated by policy iteration procedure.