The Simplex and Policy-Iteration Methods Are Strongly Polynomial for the Markov Decision Problem with a Fixed Discount Rate

Authors:
Yinyu Ye
Affiliations:
Department of Management Science and Engineering, Stanford University, Stanford, California 94305
Venue:
Mathematics of Operations Research
Year:
2011

Citing 11
Cited 1

A new polynomial-time algorithm for linear programming

Combinatorica
Dynamic programming: deterministic and stochastic models

Dynamic programming: deterministic and stochastic models
The complexity of Markov decision processes

Mathematics of Operations Research
Optimality of stationary halting policies and finite termination of successive approximations

Mathematics of Operations Research
A theory on extending algorithms for parametric problems

Mathematics of Operations Research
A primal-dual interior point method whose running time depends only on the constraint matrix

Mathematical Programming: Series A and B
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
A New Complexity Result on Solving the Markov Decision Problem

Mathematics of Operations Research
Exponential lower bounds for policy iteration

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
On the complexity of policy iteration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Strategy Iteration Is Strongly Polynomial for 2-Player Turn-Based Stochastic Games with a Constant Discount Factor

Journal of the ACM (JACM)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We prove that the classic policy-iteration method [Howard, R. A. 1960. Dynamic Programming and Markov Processes. MIT, Cambridge] and the original simplex method with the most-negative-reduced-cost pivoting rule of Dantzig are strongly polynomial-time algorithms for solving the Markov decision problem (MDP) with a fixed discount rate. Furthermore, the computational complexity of the policy-iteration and simplex methods is superior to that of the only known strongly polynomial-time interior-point algorithm [Ye, Y. 2005. A new complexity result on solving the Markov decision problem. Math. Oper. Res.30(3) 733--749] for solving this problem. The result is surprising because the simplex method with the same pivoting rule was shown to be exponential for solving a general linear programming problem [Klee, V., G. J. Minty. 1972. How good is the simplex method? Technical report. O. Shisha, ed. Inequalities III. Academic Press, New York], the simplex method with the smallest index pivoting rule was shown to be exponential for solving an MDP regardless of discount rates [Melekopoglou, M., A. Condon. 1994. On the complexity of the policy improvement algorithm for Markov decision processes. INFORMS J. Comput.6(2) 188--192], and the policy-iteration method was recently shown to be exponential for solving undiscounted MDPs under the average cost criterion. We also extend the result to solving MDPs with transient substochastic transition matrices whose spectral radii are uniformly below one.