Process-oriented planning and average-reward optimality

  • Authors:
  • Craig Boutiller;Martin L. Puterman

  • Affiliations:
  • Department of Computer Science, University of British Columbia, Vancouver, BC, Canada;Faculty of Commerce, University of British Columbia, Vancouver, BC, Canada

  • Venue:
  • IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

We argue that many AI planning problems should be viewed as process-oriented, where the aim is to produce a policy or behavior strategy with no termination condition in mind, as opposed to goal-onented. The full power of Markov decision models, adopted recently for AI planning, becomes apparent with process-oriented problems. The question of appropriate optimality criteria becomes more critical in this case, we argue that average reward optimal is most suitable While construction of averageoptimal policies involves a number of subtleties and computational difficulties, certain aspects of the problem can be solved using compact action representations such as Bayes nets. In particular, we provide an algorithm that identifies the structure of the Markov process underlying a planning problem - a crucial element of constructing average optimal policies - without explicit enumeration of the problem state space.