On the complexity of policy iteration

Authors:
Yishay Mansour;Satinder Singh
Affiliations:
AT&T Labs-Research, Florham Park, NJ;AT&T Labs-Research, Florham Park, NJ
Venue:
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Year:
1999

Citing 5
Cited 12

Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Algorithms for sequential decision-making

Algorithms for sequential decision-making

On policy iteration as a Newton's method and polynomial policy iteration algorithms

Eighteenth national conference on Artificial intelligence
A New Complexity Result on Solving the Markov Decision Problem

Mathematics of Operations Research
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
A Strongly Polynomial Algorithm for Controlled Queues

Mathematics of Operations Research
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

ACM Transactions on Algorithms (TALG)
Exponential lower bounds for policy iteration

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
Non-oblivious strategy improvement

LPAR'10 Proceedings of the 16th international conference on Logic for programming, artificial intelligence, and reasoning
Subexponential lower bounds for randomized pivoting rules for the simplex algorithm

Proceedings of the forty-third annual ACM symposium on Theory of computing
On strategy improvement algorithms for simple stochastic games

Journal of Discrete Algorithms
The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Mathematics of Operations Research
On strategy improvement algorithms for simple stochastic games

CIAC'10 Proceedings of the 7th international conference on Algorithms and Complexity
Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor

Journal of the ACM (JACM)

Quantified Score

Hi-index	0.02

Visualization

Abstract

Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MDPs). Policy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first such non-trivial, worst-case, upper bounds on the number of iterations required by PI to converge to the optimal policy. Our analysis also sheds new light on the manner in which PI progresses through the space of policies.