A note on the convergence of policy iteration in Markov decision processes with compact action spaces

Authors:
A. Y. Golubin
Affiliations:
Department of Operations Research, Moscow Institute of Electronics and Mathematics, B. Trechsvjatitelsky per., 3/12, Moscow, 109028, Russia
Venue:
Mathematics of Operations Research
Year:
2003

Citing 4
Cited 0

On the convergence of policy iteration in finite state undiscounted Markov decision processes: the unichain case

Mathematics of Operations Research
An analysis of stochastic shortest path problems

Mathematics of Operations Research
Discrete-time controlled Markov processes with average cost criterion: a survey

SIAM Journal on Control and Optimization
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control

Quantified Score

Hi-index	0.00

Visualization

Abstract

The undiscounted, unichain, finite state Markov decision process with compact action space is studied. We provide a counterexample for a result in Hordijk and Puterman (1987) and give an alternate proof of the convergence of policy iteration under the condition that there exists a state that is recurrent under every stationary policy. The analysis essentially uses a two-term matrix representation for the relative value vectors generated by policy iteration procedure.