Rewarding behaviors

Authors:
Fahiem Bacchus;Craig Boutilier;Adam Grove
Affiliations:
Dept. Computer Science, University of Waterloo, Waterloo, Ontario, Canada;Dept. Computer Science, University of British Columbia, Vancouver, B.C., Canada;NEC Research Institute, Princeton, NJ
Venue:
AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Year:
1996

Citing 12
Cited 11

A model for reasoning about persistence and causation

Computational Intelligence
Situated control rules

Proceedings of the first international conference on Principles of knowledge representation and reasoning
Temporal and modal logic

Handbook of theoretical computer science (vol. B)
Using abstractions for decision-theoretic planning with time constraints

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
An algorithm for probabilistic least-commitment planning

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Control strategies for a stochastic planner

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Dynamic Programming

Dynamic Programming
Universal plans for reactive robots in unpredictable environments

IJCAI'87 Proceedings of the 10th international joint conference on Artificial intelligence - Volume 2
Process-oriented planning and average-reward optimality

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Exploiting structure in policy construction

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Planning with a language for extended goals

Eighteenth national conference on Artificial intelligence
Non-Markovian control in the situation calculus

Eighteenth national conference on Artificial intelligence
Strong planning under partial observability

Artificial Intelligence
Non-monotonic temporal logics that facilitate elaboration tolerant revision of goals

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Decision-theoretic planning with non-Markovian rewards

Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Strong planning under partial observability

Artificial Intelligence
Structured solution methods for non-Markovian decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Non-Markovian control in the Situation Calculus

Artificial Intelligence
Anytime state-based solution methods for decision processes with non-Markovian rewards

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Implementation and comparison of solution methods for decision processes with non-markovian rewards

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Markov decision processes (MDPs) are a very popular tool for decision theoretic planning (DTP), partly because of the welldeveloped, expressive theory that includes effective solution techniques. But the Markov assumption--that dynamics and rewards depend on the current state only, and not on history-- is often inappropriate. This is especially true of rewards: we frequently wish to associate rewards with behaviors that extend over time. Of course, such reward processes can be encoded in an MDP should we have a rich enough state space (where states encode enough history). However it is often difficult to "hand craft" suitable state spaces that encode an appropriate amount of history. We consider this problem in the case where non-Markovian rewards are encoded by assigning values to formulas of a temporal logic. These formulas characterize the value of temporally extended behaviors. We argue that this allows a natural representation of many commonly encountered non-Markovian rewards. The main result is an algorithm which, given a decision process with non-Markovian rewards expressed in this manner, automatically constructs an equivalent MDP (with Markovian reward structure), allowing optimal policy construction using standard techniques.