Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor

Authors:
Thomas Dueholm Hansen;Peter Bro Miltersen;Uri Zwick
Affiliations:
Aarhus University;Aarhus University;Tel Aviv University
Venue:
Journal of the ACM (JACM)
Year:
2013

Citing 18
Cited 0

A subexponential randomized algorithm for the simple stochastic game problem

Information and Computation
The complexity of mean payoff games on graphs

Theoretical Computer Science
Competitive Markov decision processes

Competitive Markov decision processes
Linear programming, the simplex algorithm and simple polytopes

Mathematical Programming: Series A and B - Special issue: papers from ismp97, the 16th international symposium on mathematical programming, Lausanne EPFL
A Discrete Strategy Improvement Algorithm for Solving Parity Games

CAV '00 Proceedings of the 12th International Conference on Computer Aided Verification
Algorithms for sequential decision-making

Algorithms for sequential decision-making
Combinatorial structure and randomized subexponential algorithms for infinite games

Theoretical Computer Science
A combinatorial strongly subexponential strategy improvement algorithm for mean payoff games

Discrete Applied Mathematics
A New Complexity Result on Solving the Markov Decision Problem

Mathematics of Operations Research
Simple Stochastic Games, Parity Games, Mean Payoff Games and Discounted Payoff Games Are All LP-Type Problems

Algorithmica
A Simple P-Matrix Linear Complementarity Problem for Discounted Games

CiE '08 Proceedings of the 4th conference on Computability in Europe: Logic and Theory of Algorithms
A Deterministic Subexponential Algorithm for Solving Parity Games

SIAM Journal on Computing
The Complexity of Solving Stochastic Games on Graphs

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Exponential lower bounds for policy iteration

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Mathematics of Operations Research
On the complexity of policy iteration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Simple stochastic games and p-matrix generalized linear complementarity problems

FCT'05 Proceedings of the 15th international conference on Fundamentals of Computation Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ye [2011] showed recently that the simplex method with Dantzig’s pivoting rule, as well as Howard’s policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More precisely, Ye showed that both algorithms terminate after at most O(mn1−γ log n1−γ) iterations, where n is the number of states, m is the total number of actions in the MDP, and 0 γ O(m1−γ log n1−γ) iterations. Second, and more importantly, we show that the same bound applies to the number of iterations performed by the strategy iteration (or strategy improvement) algorithm, a generalization of Howard’s policy iteration algorithm used for solving 2-player turn-based stochastic games with discounted zero-sum rewards. This provides the first strongly polynomial algorithm for solving these games, solving a long standing open problem. Combined with other recent results, this provides a complete characterization of the complexity the standard strategy iteration algorithm for 2-player turn-based stochastic games; it is strongly polynomial for a fixed discount factor, and exponential otherwise.