A subexponential randomized algorithm for the simple stochastic game problem
Information and Computation
The complexity of mean payoff games on graphs
Theoretical Computer Science
Competitive Markov decision processes
Competitive Markov decision processes
Linear programming, the simplex algorithm and simple polytopes
Mathematical Programming: Series A and B - Special issue: papers from ismp97, the 16th international symposium on mathematical programming, Lausanne EPFL
A Discrete Strategy Improvement Algorithm for Solving Parity Games
CAV '00 Proceedings of the 12th International Conference on Computer Aided Verification
Algorithms for sequential decision-making
Algorithms for sequential decision-making
Combinatorial structure and randomized subexponential algorithms for infinite games
Theoretical Computer Science
A combinatorial strongly subexponential strategy improvement algorithm for mean payoff games
Discrete Applied Mathematics
A New Complexity Result on Solving the Markov Decision Problem
Mathematics of Operations Research
A Simple P-Matrix Linear Complementarity Problem for Discounted Games
CiE '08 Proceedings of the 4th conference on Computability in Europe: Logic and Theory of Algorithms
A Deterministic Subexponential Algorithm for Solving Parity Games
SIAM Journal on Computing
The Complexity of Solving Stochastic Games on Graphs
ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Exponential lower bounds for policy iteration
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
Mathematics of Operations Research
On the complexity of policy iteration
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Simple stochastic games and p-matrix generalized linear complementarity problems
FCT'05 Proceedings of the 15th international conference on Fundamentals of Computation Theory
Hi-index | 0.00 |
Ye [2011] showed recently that the simplex method with Dantzig’s pivoting rule, as well as Howard’s policy iteration algorithm, solve discounted Markov decision processes (MDPs), with a constant discount factor, in strongly polynomial time. More precisely, Ye showed that both algorithms terminate after at most O(mn1−γ log n1−γ) iterations, where n is the number of states, m is the total number of actions in the MDP, and 0 γ O(m1−γ log n1−γ) iterations. Second, and more importantly, we show that the same bound applies to the number of iterations performed by the strategy iteration (or strategy improvement) algorithm, a generalization of Howard’s policy iteration algorithm used for solving 2-player turn-based stochastic games with discounted zero-sum rewards. This provides the first strongly polynomial algorithm for solving these games, solving a long standing open problem. Combined with other recent results, this provides a complete characterization of the complexity the standard strategy iteration algorithm for 2-player turn-based stochastic games; it is strongly polynomial for a fixed discount factor, and exponential otherwise.