Dynamic programming: deterministic and stochastic models
Dynamic programming: deterministic and stochastic models
Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Algorithms for sequential decision-making
Algorithms for sequential decision-making
On policy iteration as a Newton's method and polynomial policy iteration algorithms
Eighteenth national conference on Artificial intelligence
A New Complexity Result on Solving the Markov Decision Problem
Mathematics of Operations Research
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
A Strongly Polynomial Algorithm for Controlled Queues
Mathematics of Operations Research
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths
ACM Transactions on Algorithms (TALG)
Exponential lower bounds for policy iteration
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
Non-oblivious strategy improvement
LPAR'10 Proceedings of the 16th international conference on Logic for programming, artificial intelligence, and reasoning
Subexponential lower bounds for randomized pivoting rules for the simplex algorithm
Proceedings of the forty-third annual ACM symposium on Theory of computing
On strategy improvement algorithms for simple stochastic games
Journal of Discrete Algorithms
Mathematics of Operations Research
On strategy improvement algorithms for simple stochastic games
CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
Hi-index | 0.02 |
Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies.