Complexity of finite-horizon Markov decision process problems

  • Authors:
  • Martin Mundhenk;Judy Goldsmith;Christopher Lusena;Eric Allender

  • Affiliations:
  • Univ. Trier, Trier, Germany;Univ. of Kentucky, Lexington, KY;Univ. of Kentucky, Lexington, KY;Rutgers Univ., Piscataway, NJ

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Controlled stochastic systems occur in science engineering, manufacturing, social sciences, and many other cntexts. If the systems is modeled as a Markov decision process (MDP) and will run ad infinitum, the optimal control policy can be computed in polynomial time using linear programming. The problems considered here assume that the time that the process will run is finite, and based on the size of the input. There are mny factors that compound the complexity of computing the optimal policy. For instance, there are many factors that compound the complexity of this computation. For instance, if the controller does not have complete information about the state of the system, or if the system is represented in some very succint manner, the optimal policy is provably not computable in time polynomial in the size of the input. We analyze the computational complexity of evaluating policies and of determining whether a sufficiently good policy exists for a MDP, based on a number of confounding factors, including the observability of the system state; the succinctness of the representation; the type of policy; even the number of actions relative to the number of states. In almost every case, we show that the decision problem is complete for some known complexity class. Some of these results are familiar from work by Papadimitriou and Tsitsiklis and others, but some, such as our PL-completeness proofs, are surprising. We include proofs of completeness for natural problems in the as yet little-studied classes NPPP.