Technical communique: Policy set iteration for Markov decision processes

Authors:
Hyeong Soo Chang
Affiliations:
-
Venue:
Automatica (Journal of IFAC)
Year:
2013

Citing 4
Cited 0

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Survey paper: Research on probabilistic methods for control system design

Automatica (Journal of IFAC)
Survey A survey of computational complexity results in systems and control

Automatica (Journal of IFAC)
Simulation-Based Algorithms for Markov Decision Processes

Simulation-Based Algorithms for Markov Decision Processes

Quantified Score

Hi-index	22.14

Visualization

Abstract

This communique presents an algorithm called ''policy set iteration'' (PSI) for solving infinite horizon discounted Markov decision processes with finite state and action spaces as a simple generalization of policy iteration (PI). PSI generates a monotonically improving sequence of stationary Markovian policies {@p"k^*} based on a set manipulation, as opposed to PI's single policy manipulation, at each iteration k. When the set involved with PSI at k contains N independently generated sample-policies from a given distribution d, the probability that the expected value of any sampled policy from d with respect to an initial state distribution is greater than that of @p"k^* converges to zero with O(N^-^k) rate. Moreover, PSI converges to an optimal policy no slower than PI in terms of the number of iterations for any d.