Complexity of probabilistic planning under average rewards

  • Authors:
  • Jussi Rintanen

  • Affiliations:
  • Albert-Ludwigs-Universität Freiburg, Institut für Informatik, Freiburg im Breisgau, Germany

  • Venue:
  • IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large state spaces are best modelled implicitly (instead of explicitly by enumerating the state space), for example as precondition-effect operators, the representation used in AI planning. This kind of representations are very powerful, and they make the construction of policies/plans computationally very complex. In many applications, average rewards over unit time is the relevant rationality criterion, as opposed to the more widely used discounted reward criterion, and for providing a solid basis for the development of efficient planning algorithms, the computational complexity of the decision problems related to average rewards has to be analyzed. We investigate the complexity of the policy/plan existence problem for MDPs under the average reward criterion, with MDPs represented in terms of conditional probabilistic precondition-effect operators. We consider policies with and without memory, and with different degrees of sensing/observability. The unrestricted policy existence problem for the partially observable cases was earlier known to be undecidable. The results place the remaining computational problems to the complexity classes EXP and NEXP (deterministic and nondeterministic exponential time.)