Finding Best k Policies

Authors:
Peng Dai;Judy Goldsmith
Affiliations:
Computer Science & Engineering University of Washington, Seattle 98195-2350;Dept. of Comp. Sci. Lexington, Univ. of Kentucky, USA 40506-0046
Venue:
ADT '09 Proceedings of the 1st International Conference on Algorithmic Decision Theory
Year:
2009

Citing 12
Cited 1

An analysis of stochastic shortest path problems

Mathematics of Operations Research
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Dynamic Programming

Dynamic Programming
Prioritization Methods for Accelerating MDP Solvers

The Journal of Machine Learning Research
Finding the K shortest hyperpaths

Computers and Operations Research
On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems

Mathematics of Operations Research
Search for Compromise Solutions in Multiobjective State Space Graphs

Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Learning to act using real-time dynamic programming

Artificial Intelligence
Planning under continuous time and resource uncertainty: a challenge for AI

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Finding the K shortest hyperpaths using reoptimization

Operations Research Letters

Ranking policies in discrete Markov decision processes

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k 1, cannot be found directly using dynamic programming. Naïvely, finding the k -th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k . We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k -th best policy differs from the i -th policy, for some i k , on exactly one state. We show that the time complexity of the algorithm is quadratic in k , but the number of optimal planning problems it solves is linear in k . We demonstrate empirically that the new algorithm has good scalability.