Polynomial value iteration algorithms for deterministic MDPs

  • Authors:
  • Omid Madani

  • Affiliations:
  • Department of Computing Science, University of Alberta, Edmonton, AB, Canada

  • Venue:
  • UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Value iteration is a commonly used and empirically competitive method in solving many Markov decision process problems. However, it is known that value iteration has only pseudopolynomial complexity in general. We establish a somewhat surprising polynomial bound for value iteration on deterministic Markov decision (DMDP) problems. We show that the basic value iteration procedure converges to the highest average reward cycle on a DMDP problem in θ(n2) iterations, or θ(mn2) total time, where n denotes the number of states, and m the number of edges. We give two extensions of value iteration that solve the DMDP in θ(mn) time. We explore the analysis of policy iteration algorithms and report on an empirical study of value iteration showing that its convergence is much faster on random sparse graphs.