Eliciting additive reward functions for Markov decision processes

Authors:
Kevin Regan;Craig Boutilier
Affiliations:
Department of Computer Science, University of Toronto;Department of Computer Science, University of Toronto
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Year:
2011

Citing 5
Cited 3

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
The Vision of Autonomic Computing

Computer
A decision-theoretic approach to task assistance for persons with dementia

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Constraint-based optimization and utility elicitation using the minimax decision criterion

Artificial Intelligence
Regret-based reward elicitation for Markov decision processes

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence

Robust online optimization of reward-uncertain MDPs

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special issue on highlights of the decade in interactive intelligent systems
Interactive value iteration for Markov decision processes with unknown rewards

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Specifying the reward function of a Markov decision process (MDP) can be demanding, requiring human assessment of the precise quality of, and tradeoffs among, various states and actions. However, reward functions often possess considerable structure which can be leveraged to streamline their specification. We develop new, decision-theoretically sound heuristics for eliciting rewards for factored MDPs whose reward functions exhibit additive independence. Since we can often find good policies without complete reward specification, we also develop new (exact and approximate) algorithms for robust optimization of imprecise-reward MDPs with such additive reward. Our methods are evaluated in two domains: autonomic computing and assistive technology.