Probabilistic planning with non-linear utility functions and worst-case guarantees

Authors:
Stefano Ermon;Carla Gomes;Bart Selman;Alexander Vladimirsky
Affiliations:
Cornell University;Cornell University;Cornell University;Cornell University
Venue:
Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Year:
2012

Citing 6
Cited 0

Mean, variance, and probabilistic criteria in finite Markov decision processes: a review

Journal of Optimization Theory and Applications
Constrained discounted dynamic programming

Mathematics of Operations Research
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Decision-Theoretic Control of Planetary Rovers

Revised Papers from the International Seminar on Advances in Plan-Based Control of Robotic Agents,
Decision-theoretic planning under risk-sensitive planning objectives

Decision-theoretic planning under risk-sensitive planning objectives
On trip planning queries in spatial databases

SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Markov Decision Processes are one of the most widely used frameworks to formulate probabilistic planning problems. Since planners are often risk-sensitive in high-stake situations, non-linear utility functions are often introduced to describe their preferences among all possible outcomes. Alternatively, risk-sensitive decision makers often require their plans to satisfy certain worst-case guarantees. We show how to combine these two approaches by considering problems where we maximize the expected utility of the total reward subject to worst-case constraints. We generalize several existing results on the structure of optimal policies to the constrained case, both for finite and infinite horizon problems. We provide a Dynamic Programming algorithm to compute the optimal policy, and we introduce an admissible heuristic to effectively prune the search space. Finally, we use a stochastic shortest path problem on large real-world road networks to demonstrate the practical applicability of our method.