Network flows: theory, algorithms, and applications
Network flows: theory, algorithms, and applications
Artificial intelligence: a modern approach
Artificial intelligence: a modern approach
The complexity of mean payoff games on graphs
Theoretical Computer Science
On policy iteration as a Newton's method and polynomial policy iteration algorithms
Eighteenth national conference on Artificial intelligence
Algorithms for sequential decision-making
Algorithms for sequential decision-making
Finite-memory control of partially observable systems
Finite-memory control of partially observable systems
Max-norm projections for factored MDPs
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Faster maximum and minimum mean cycle algorithms for system-performance analysis
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
25 Years of Model Checking
Online Regret Bounds for Markov Decision Processes with Deterministic Transitions
ALT '08 Proceedings of the 19th international conference on Algorithmic Learning Theory
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Discounted deterministic Markov decision processes and discounted all-pairs shortest paths
ACM Transactions on Algorithms (TALG)
Online regret bounds for Markov decision processes with deterministic transitions
Theoretical Computer Science
Hi-index | 0.00 |
Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudopolynomial complexity in general. We establish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges to the highest average reward cycle on a DMDP problem in θ(n2) iterations, or θ(mn2) total time, where n denotes the number of states, and m the number of edges. We give two extensions of value iteration that solve the DMDP in θ(mn) time. We explore the analysis of policy iteration algorithms and report on an empirical study of value iteration showing that its convergence is much faster on random sparse graphs.