Robust online optimization of reward-uncertain MDPs

Authors:
Kevin Regan;Craig Boutilier
Affiliations:
Department of Computer Science, University of Toronto;Department of Computer Science, University of Toronto
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Year:
2011

Citing 12
Cited 2

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Algorithms for Inverse Reinforcement Learning

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Algorithms for partially observable markov decision processes

Algorithms for partially observable markov decision processes
Robust Dynamic Programming

Mathematics of Operations Research
Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Operations Research
Percentile optimization in uncertain Markov decision processes with application to efficient exploration

Proceedings of the 24th international conference on Machine learning
Solving transition independent decentralized Markov decision processes

Journal of Artificial Intelligence Research
A bilinear programming approach for multiagent planning

Journal of Artificial Intelligence Research
Constraint-based optimization and utility elicitation using the minimax decision criterion

Artificial Intelligence
Regret-based reward elicitation for Markov decision processes

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
A planning system based on Markov decision processes to guide people with dementia through activities of daily living

IEEE Transactions on Information Technology in Biomedicine
Eliciting additive reward functions for Markov decision processes

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special issue on highlights of the decade in interactive intelligent systems
Interactive value iteration for Markov decision processes with unknown rewards

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Imprecise-reward Markov decision processes (IRMDPs) are MDPs in which the reward function is only partially specified (e.g., by some elicitation process). Recent work using minimax regret to solve IRMDPs has shown, despite their theoretical intractability, how the set of policies that are nondominated w.r.t. reward uncertainty can be exploited to accelerate regret computation. However, the number of nondominated policies is generally so large as to undermine this leverage. In this paper, we show how the quality of the approximation can be improved online by pruning/adding nondominated policies during reward elicitation, while maintaining computational tractability. Drawing insights from the POMDP literature, we also develop a new anytime algorithm for constructing the set of nondominated policies with provable (anytime) error bounds. These bounds can be exploited to great effect in our online approximation scheme.