Percentile optimization in uncertain Markov decision processes with application to efficient exploration

Authors:
Erick Delage;Shie Mannor
Affiliations:
Stanford University, Stanford, California;McGill University, Montreal, Quebec, Canada
Venue:
Proceedings of the 24th international conference on Machine learning
Year:
2007

Citing 11
Cited 5

Robust Convex Optimization

Mathematics of Operations Research
Bounded-parameter Markov decision process

Artificial Intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Near-Optimal Reinforcement Learning in Polynominal Time

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
R-max - a general polynomial time algorithm for near-optimal reinforcement learning

The Journal of Machine Learning Research
Robust Dynamic Programming

Mathematics of Operations Research
A theoretical analysis of Model-Based Interval Estimation

ICML '05 Proceedings of the 22nd international conference on Machine learning
Bias and Variance Approximation in Value Function Estimates

Management Science
Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Operations Research
Convex Approximations of Chance Constrained Programs

SIAM Journal on Optimization
Model based Bayesian exploration

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Efficient Uncertainty Propagation for Reinforcement Learning with Limited Data

ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Regret-based reward elicitation for Markov decision processes

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Uncertainty Propagation for Efficient Exploration in Reinforcement Learning

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Robust online optimization of reward-uncertain MDPs

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three

Quantified Score

Hi-index	0.00

Visualization

Abstract

Markov decision processes are an effective tool in modeling decision-making in uncertain dynamic environments. Since the parameters of these models are typically estimated from data, learned from experience, or designed by hand, it is not surprising that the actual performance of a chosen strategy often significantly differs from the designer's initial expectations due to unavoidable model uncertainty. In this paper, we present a percentile criterion that captures the trade-off between optimistic and pessimistic points of view on MDP with parameter uncertainty. We describe tractable methods that take parameter uncertainty into account in the process of decision making. Finally, we propose a cost-effective exploration strategy when it is possible to invest (money, time or computation efforts) in actions that will reduce the uncertainty in the parameters.