An analysis of stochastic shortest path problems
Mathematics of Operations Research
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming
Dynamic Programming
Prioritization Methods for Accelerating MDP Solvers
The Journal of Machine Learning Research
Finding the K shortest hyperpaths
Computers and Operations Research
On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems
Mathematics of Operations Research
Search for Compromise Solutions in Multiobjective State Space Graphs
Proceedings of the 2006 conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 -- September 1, 2006, Riva del Garda, Italy
Learning to act using real-time dynamic programming
Artificial Intelligence
Planning under continuous time and resource uncertainty: a challenge for AI
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Finding the K shortest hyperpaths using reoptimization
Operations Research Letters
Ranking policies in discrete Markov decision processes
Annals of Mathematics and Artificial Intelligence
Hi-index | 0.00 |
An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k 1, cannot be found directly using dynamic programming. Naïvely, finding the k -th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k . We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k -th best policy differs from the i -th policy, for some i k , on exactly one state. We show that the time complexity of the algorithm is quadratic in k , but the number of optimal planning problems it solves is linear in k . We demonstrate empirically that the new algorithm has good scalability.