Technical communique: Policy set iteration for Markov decision processes

  • Authors:
  • Hyeong Soo Chang

  • Affiliations:
  • -

  • Venue:
  • Automatica (Journal of IFAC)
  • Year:
  • 2013

Quantified Score

Hi-index 22.14

Visualization

Abstract

This communique presents an algorithm called ''policy set iteration'' (PSI) for solving infinite horizon discounted Markov decision processes with finite state and action spaces as a simple generalization of policy iteration (PI). PSI generates a monotonically improving sequence of stationary Markovian policies {@p"k^*} based on a set manipulation, as opposed to PI's single policy manipulation, at each iteration k. When the set involved with PSI at k contains N independently generated sample-policies from a given distribution d, the probability that the expected value of any sampled policy from d with respect to an initial state distribution is greater than that of @p"k^* converges to zero with O(N^-^k) rate. Moreover, PSI converges to an optimal policy no slower than PI in terms of the number of iterations for any d.