Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Survey paper: Research on probabilistic methods for control system design
Automatica (Journal of IFAC)
Survey A survey of computational complexity results in systems and control
Automatica (Journal of IFAC)
Simulation-Based Algorithms for Markov Decision Processes
Simulation-Based Algorithms for Markov Decision Processes
Hi-index | 22.14 |
This communique presents an algorithm called ''policy set iteration'' (PSI) for solving infinite horizon discounted Markov decision processes with finite state and action spaces as a simple generalization of policy iteration (PI). PSI generates a monotonically improving sequence of stationary Markovian policies {@p"k^*} based on a set manipulation, as opposed to PI's single policy manipulation, at each iteration k. When the set involved with PSI at k contains N independently generated sample-policies from a given distribution d, the probability that the expected value of any sampled policy from d with respect to an initial state distribution is greater than that of @p"k^* converges to zero with O(N^-^k) rate. Moreover, PSI converges to an optimal policy no slower than PI in terms of the number of iterations for any d.