Regret-based reward elicitation for Markov decision processes

Authors:
Kevin Regan;Craig Boutilier
Affiliations:
University of Toronto, Toronto, ON, Canada;University of Toronto, Toronto, ON, Canada
Venue:
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Year:
2009

Citing 13
Cited 7

Decision theory: an introduction to the mathematics of rationality

Decision theory: an introduction to the mathematics of rationality
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Making Rational Decisions Using Adaptive Utility Elicitation

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A POMDP formulation of preference elicitation problems

Eighteenth national conference on Artificial intelligence
Robust Dynamic Programming

Mathematics of Operations Research
Percentile optimization in uncertain Markov decision processes with application to efficient exploration

Proceedings of the 24th international conference on Machine learning
Eliciting bid taker non-price preferences in (combinatorial) auctions

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Value-based policy teaching with active indirect elicitation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Efficient solution algorithms for factored MDPs

Journal of Artificial Intelligence Research
Constraint-based optimization and utility elicitation using the minimax decision criterion

Artificial Intelligence
Learning the structure of dynamic probabilistic networks

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Cooperative negotiation in autonomic systems using incremental utility elicitation

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence

Preference elicitation for risky prospects

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
A geometric approach to find nondominated policies to imprecise reward MDPs

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Eliciting additive reward functions for Markov decision processes

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Robust online optimization of reward-uncertain MDPs

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Generating diverse plans to handle unknown and partially known user preferences

Artificial Intelligence
People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special issue on highlights of the decade in interactive intelligent systems
Interactive value iteration for Markov decision processes with unknown rewards

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The specification of a Markov decision process (MDP) can be difficult. Reward function specification is especially problematic; in practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work casts the problem of specifying rewards as one of preference elicitation and aims to minimize the degree of precision with which a reward function must be specified while still allowing optimal or near-optimal policies to be produced. We first discuss how robust policies can be computed for MDPs given only partial reward information using the minimax regret criterion. We then demonstrate how regret can be reduced by efficiently eliciting reward information using bound queries, using regret-reduction as a means for choosing suitable queries. Empirical results demonstrate that regret-based reward elicitation offers an effective way to produce near-optimal policies without resorting to the precise specification of the entire reward function.