Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
Simple and Fast Algorithms for Linear and Integer Programs with Two Variables Per Inequality
SIAM Journal on Computing
A polynomial combinatorial algorithm for generalized minimum cost flow
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Deciding Linear Inequalities by Computing Loop Residues
Journal of the ACM (JACM)
LAO: a heuristic search algorithm that finds solutions with loops
Artificial Intelligence - Special issue on heuristic search in artificial intelligence
Dynamic Programming and Optimal Control
Dynamic Programming and Optimal Control
Algorithms for sequential decision-making
Algorithms for sequential decision-making
Finite-memory control of partially observable systems
Finite-memory control of partially observable systems
Complexity results for infinite-horizon markov decision processes
Complexity results for infinite-horizon markov decision processes
Max-norm projections for factored MDPs
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
On the complexity of policy iteration
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
On policy learning in restricted policy spaces
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths
ACM Transactions on Algorithms (TALG)
Polynomial value iteration algorithms for deterministic MDPs
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Hi-index | 0.00 |
Policy iteration is a popular technique for solving Markov decision processes (MDPs). It is easy to describe and implement, and has excellent performance in practice. But not much is known about its complexity. The best upper bound remains exponential, and the best lower bound is a trivial Ω(n) on the number of iterations, where n is the number of states.This paper improves the upper bounds to a polynomial for policy iteration on MDP problems with special graph structure. Our analysis is based on the connection between policy iteration and Newton's method for finding the zero of a convex function. The analysis offers an explanation as to why policy iteration is fast. It also leads to polynomial bounds on several variants of policy iteration for MDPs for which the linear programming formulation requires at most two variables per inequality (MDP(2)). The MDP(2) class includes deterministic MDPs under discounted and average reward criteria. The bounds on the run times include O(mn2 log m log W) on MDP(2) and O(mn2 log m) for deterministic MDPs, where m denotes the number of actions and W denotes the magnitude of the largest number in the problem description.