Polynomial value iteration algorithms for deterministic MDPs

Authors:
Omid Madani
Affiliations:
Department of Computing Science, University of Alberta, Edmonton, AB, Canada
Venue:
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Year:
2002

Citing 8
Cited 5

Network flows: theory, algorithms, and applications

Network flows: theory, algorithms, and applications
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
The complexity of mean payoff games on graphs

Theoretical Computer Science
On policy iteration as a Newton's method and polynomial policy iteration algorithms

Eighteenth national conference on Artificial intelligence
Algorithms for sequential decision-making

Algorithms for sequential decision-making
Finite-memory control of partially observable systems

Finite-memory control of partially observable systems
Max-norm projections for factored MDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Faster maximum and minimum mean cycle algorithms for system-performance analysis

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Value Iteration

25 Years of Model Checking
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions

ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths

ACM Transactions on Algorithms (TALG)
Online regret bounds for Markov decision processes with deterministic transitions

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudopolynomial complexity in general. We establish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges to the highest average reward cycle on a DMDP problem in θ(n2) iterations, or θ(mn2) total time, where n denotes the number of states, and m the number of edges. We give two extensions of value iteration that solve the DMDP in θ(mn) time. We explore the analysis of policy iteration algorithms and report on an empirical study of value iteration showing that its convergence is much faster on random sparse graphs.