A note on the convergence of policy iteration in Markov decision processes with compact action spaces

  • Authors:
  • A. Y. Golubin

  • Affiliations:
  • Department of Operations Research, Moscow Institute of Electronics and Mathematics, B. Trechsvjatitelsky per., 3/12, Moscow, 109028, Russia

  • Venue:
  • Mathematics of Operations Research
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The undiscounted, unichain, finite state Markov decision process with compact action space is studied. We provide a counterexample for a result in Hordijk and Puterman (1987) and give an alternate proof of the convergence of policy iteration under the condition that there exists a state that is recurrent under every stationary policy. The analysis essentially uses a two-term matrix representation for the relative value vectors generated by policy iteration procedure.