Process-oriented planning and average-reward optimality

Authors:
Craig Boutiller;Martin L. Puterman
Affiliations:
Department of Computer Science, University of British Columbia, Vancouver, BC, Canada;Faculty of Commerce, University of British Columbia, Vancouver, BC, Canada
Venue:
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Year:
1995

Citing 9
Cited 6

A model for reasoning about persistence and causation

Computational Intelligence
Multichain Markov decision processes with a sample path constraint: a decomposition approach

Mathematics of Operations Research
Planning and control

Planning and control
Reinforcement learning algorithms for average-payoff Markovian decision processes

AAAI '94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 1)
Using abstractions for decision-theoretic planning with time constraints

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
An algorithm for probabilistic least-commitment planning

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Dynamic Programming

Dynamic Programming
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2

Planning, learning and coordination in multiagent decision processes

TARK '96 Proceedings of the 6th conference on Theoretical aspects of rationality and knowledge
Complexity of probabilistic planning under average rewards

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Knowledge representation for stochastic decision processes

Artificial intelligence today
Rewarding behaviors

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
An average-reward reinforcement learning algorithm for computing bias-optimal policies

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
A framework for decision-theoretic planning I: combining the situation calculus, conditional plans, probability and utility

UAI'96 Proceedings of the Twelfth international conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We argue that many AI planning problems should be viewed as process-oriented, where the aim is to produce a policy or behavior strategy with no termination condition in mind, as opposed to goal-onented. The full power of Markov decision models, adopted recently for AI planning, becomes apparent with process-oriented problems. The question of appropriate optimality criteria becomes more critical in this case, we argue that average reward optimal is most suitable While construction of averageoptimal policies involves a number of subtleties and computational difficulties, certain aspects of the problem can be solved using compact action representations such as Bayes nets. In particular, we provide an algorithm that identifies the structure of the Markov process underlying a planning problem - a crucial element of constructing average optimal policies - without explicit enumeration of the problem state space.