Complexity of probabilistic planning under average rewards

Authors:
Jussi Rintanen
Affiliations:
Albert-Ludwigs-Universität Freiburg, Institut für Informatik, Freiburg im Breisgau, Germany
Venue:
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Year:
2001

Citing 19
Cited 1

Succinct representations of graphs

Information and Control
A note on succinct representations of graphs

Information and Control
The complexity of Markov decision processes

Mathematics of Operations Research
Structural complexity 1

Structural complexity 1
The complexity of graph problems for succinctly represented graphs

WG '89 Proceedings of the fifteenth international workshop on Graph-theoretic concepts in computer science
The computational complexity of propositional STRIPS planning

Artificial Intelligence
Probability, stochastic processes, and queueing theory: the mathematics of computer performance modeling

Probability, stochastic processes, and queueing theory: the mathematics of computer performance modeling
The complexity of searching implicit graphs

Artificial Intelligence
On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to probabilistic automata (Computer science and applied mathematics)

Introduction to probabilistic automata (Computer science and applied mathematics)
On the complexity of space bounded interactive proofs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
The computational complexity of probabilistic planning

Journal of Artificial Intelligence Research
Process-oriented planning and average-reward optimality

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Probabilistic propositional planning: representations and complexity

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
My brain is full: when more memory helps

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching the space of finite policies

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

On the undecidability of probabilistic planning and related stochastic optimization problems

Artificial Intelligence - special issue on planning with uncertainty and incomplete information

Quantified Score

Hi-index	0.00

Visualization

Abstract

A general and expressive model of sequential decision making under uncertainty is provided by the Markov decision processes (MDPs) framework. Complex applications with very large state spaces are best modelled implicitly (instead of explicitly by enumerating the state space), for example as precondition-effect operators, the representation used in AI planning. This kind of representations are very powerful, and they make the construction of policies/plans computationally very complex. In many applications, average rewards over unit time is the relevant rationality criterion, as opposed to the more widely used discounted reward criterion, and for providing a solid basis for the development of efficient planning algorithms, the computational complexity of the decision problems related to average rewards has to be analyzed. We investigate the complexity of the policy/plan existence problem for MDPs under the average reward criterion, with MDPs represented in terms of conditional probabilistic precondition-effect operators. We consider policies with and without memory, and with different degrees of sensing/observability. The unrestricted policy existence problem for the partially observable cases was earlier known to be undecidable. The results place the remaining computational problems to the complexity classes EXP and NEXP (deterministic and nondeterministic exponential time.)