On the undecidability of probabilistic planning and related stochastic optimization problems

  • Authors:
  • Omid Madani;Steve Hanks;Anne Condon

  • Affiliations:
  • Department of Comp. Sci., University of Alberta, Edmonton, AB T6G 2E8, Canada;Institute of Technology, University of Washington, Tacoma, WA;Department of Computer Science, 2366 Main Mall, University of British Columbia, Vancouver, BC V6T 1Z4, Canada

  • Venue:
  • Artificial Intelligence - special issue on planning with uncertainty and incomplete information
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automated planning, the problem of how an agent achieves a goal given a repertoire of actions, is one of the foundational and most widely studied problems in the AI literature. The original formulation of the problem makes strong assumptions regarding the agent's knowledge and control over the world, namely that its information is complete and correct, and that the results of its actions are deterministic and known. Recent research in planning under uncertainty has endeavored te relax these assumptions, providing formal and computation models wherein the agent has incomplete or noisy information about the world and has noisy sensors and effectors. This research has mainly taken one of two approaches: extend the classical planning paradigm to a semantics that admits uncertainty, or adopt another framework for approaching the problem, most commonly the Markov Decision Process (MDP) model. This paper presents a complexity analysis of planning under uncertainty. It begins with the "probabilistic classical planning" problem, showing that problem to be formally undecidable. This fundamental result is then applied to a broad class of stochastic optimization problems, in brief any problem statement where the agent (a) operates over an infinite or indefinite time horizon, and (b) has available only probabilistic information about the system's state. Undecidability is established for policy-existence problems for partially observable infinite-horizon Markov decision processes under discounted and undiscounted total reward models, average-reward models, and state-avoidance models. The results also apply to corresponding approximation problems with undiscounted objective functions. The paper answers a significant open question raised by Papadimitriou and Tsitsiklis [Math. Oper. Res. 12 (3) (1987) 441-450] about the complexity of infinite horizon POMDPs.