Ranking policies in discrete Markov decision processes

Authors:
Peng Dai;Judy Goldsmith
Affiliations:
Computer Science & Engineering, University of Washington, Seattle, USA 98195-2350;Department of Computer Science, University of Kentucky, Lexington, USA 40506-0046
Venue:
Annals of Mathematics and Artificial Intelligence
Year:
2010

Citing 11
Cited 0

Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Prioritization Methods for Accelerating MDP Solvers

The Journal of Machine Learning Research
Finding the K shortest hyperpaths

Computers and Operations Research
On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems

Mathematics of Operations Research
Learning to act using real-time dynamic programming

Artificial Intelligence
Finding Best k Policies

ADT '09 Proceedings of the 1st International Conference on Algorithmic Decision Theory
On Finding Compromise Solutions in Multiobjective Markov Decision Processes

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Planning under continuous time and resource uncertainty: a challenge for AI

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Finding the K shortest hyperpaths using reoptimization

Operations Research Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding an optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies of a discrete Markov decision process. The k best policies, k驴驴1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policies problem by using this reduction requires unreasonable amounts of time even when k驴=驴3. We then provide two new algorithms. The first is a complete algorithm, based on our theoretical contribution that the k-th best policy differs from the i-th policy, for some i驴k, on exactly one state. The second is an approximate algorithm that skips many less useful policies. We show that both algorithms have good scalability. We also show that the approximate algorithms runs much faster and finds interesting, high-quality policies.