Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Authors:
Arnab Nilim;Laurent El Ghaoui
Affiliations:
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720;Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720
Venue:
Operations Research
Year:
2005

Citing 5
Cited 30

An Engineering Approach to Optimal Control and Estimation Theory

An Engineering Approach to Optimal Control and Estimation Theory
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Bounded Parameter Markov Decision Processes

ECP '97 Proceedings of the 4th European Conference on Planning: Recent Advances in AI Planning
Convex Optimization

Convex Optimization

Partially observable Markov decision processes with imprecise parameters

Artificial Intelligence
Percentile optimization in uncertain Markov decision processes with application to efficient exploration

Proceedings of the 24th international conference on Machine learning
Reachability analysis of uncertain systems using bounded-parameter Markov decision processes

Artificial Intelligence
Logic synthesis for reducing leakage power consumption under workload uncertainty

ICC'08 Proceedings of the 12th WSEAS international conference on Circuits
Online Markov Decision Processes

Mathematics of Operations Research
Imprecise markov chains and their limit behavior

Probability in the Engineering and Informational Sciences
Discrete time Markov chains with interval probabilities

International Journal of Approximate Reasoning
Online learning in Markov decision processes with arbitrarily changing rewards and transitions

GameNets'09 Proceedings of the First ICST international conference on Game Theory for Networks
Robust adaptive Markov decision processes in multi-vehicle applications

ACC'09 Proceedings of the 2009 conference on American Control Conference
Percentile Optimization for Markov Decision Processes with Parameter Uncertainty

Operations Research
Bounded parameter Markov decision processes with average reward criterion

COLT'07 Proceedings of the 20th annual conference on Learning theory
Approximate robust policy iteration using multilayer perceptron neural networks for discounted infinite-horizon Markov decision processes with uncertain correlated transition matrices

IEEE Transactions on Neural Networks
Discounted Robust Stochastic Games and an Application to Queueing Control

Operations Research
Efficient solutions to factored MDPs with imprecise transition probabilities

Artificial Intelligence
Using mathematical programming to solve Factored Markov Decision Processes with Imprecise Probabilities

International Journal of Approximate Reasoning
Tight Bounds for Some Risk Measures, with Applications to Robust Portfolio Selection

Operations Research
Theory and Applications of Robust Optimization

SIAM Review
Robust Adversarial Risk Analysis: A Level-k Approach

Decision Analysis
Distributionally Robust Markov Decision Processes

Mathematics of Operations Research
Probabilistic goal Markov decision processes

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Robust online optimization of reward-uncertain MDPs

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
A dynamic programming approach to adjustable robust optimization

Operations Research Letters
Optimal Structural Policies for Ambiguity and Risk Averse Inventory and Pricing Models

SIAM Journal on Control and Optimization
A framework for computing bounds for the return of a policy

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Adaptive planning for markov decision processes with uncertain transition models via incremental feature dependency discovery

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
On the complexity of model checking interval-valued discrete time Markov chains

Information Processing Letters
Robust Markov Decision Processes

Mathematics of Operations Research
Robust Modified Policy Iteration

INFORMS Journal on Computing
Polynomial-Time verification of PCTL properties of MDPs with convex uncertainties

CAV'13 Proceedings of the 25th international conference on Computer Aided Verification
Intelligent Cooperative Control Architecture: A Framework for Performance Improvement Using Safe Learning

Journal of Intelligent and Robotic Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Optimal solutions to Markov decision problems may be very sensitive with respect to the state transition probabilities. In many practical problems, the estimation of these probabilities is far from accurate. Hence, estimation errors are limiting factors in applying Markov decision processes to real-world problems. We consider a robust control problem for a finite-state, finite-action Markov decision process, where uncertainty on the transition matrices is described in terms of possibly nonconvex sets. We show that perfect duality holds for this problem, and that as a consequence, it can be solved with a variant of the classical dynamic programming algorithm, the "robust dynamic programming" algorithm. We show that a particular choice of the uncertainty sets, involving likelihood regions or entropy bounds, leads to both a statistically accurate representation of uncertainty, and a complexity of the robust recursion that is almost the same as that of the classical recursion. Hence, robustness can be added at practically no extra computing cost. We derive similar results for other uncertainty sets, including one with a finite number of possible values for the transition matrices. We describe in a practical path planning example the benefits of using a robust strategy instead of the classical optimal strategy; even if the uncertainty level is only crudely guessed, the robust strategy yields a much better worst-case expected travel time.