Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Neuro-Dynamic Programming
Prioritization Methods for Accelerating MDP Solvers
The Journal of Machine Learning Research
Finding the K shortest hyperpaths
Computers and Operations Research
On the Speed of Convergence of Value Iteration on Stochastic Shortest-Path Problems
Mathematics of Operations Research
Learning to act using real-time dynamic programming
Artificial Intelligence
ADT '09 Proceedings of the 1st International Conference on Algorithmic Decision Theory
On Finding Compromise Solutions in Multiobjective Markov Decision Processes
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Planning under continuous time and resource uncertainty: a challenge for AI
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
On the complexity of solving Markov decision problems
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Finding the K shortest hyperpaths using reoptimization
Operations Research Letters
Hi-index | 0.00 |
An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding an optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies of a discrete Markov decision process. The k best policies, k驴驴1, cannot be found directly using dynamic programming. Naïvely, finding the k-th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k. We show empirically that solving k best policies problem by using this reduction requires unreasonable amounts of time even when k驴=驴3. We then provide two new algorithms. The first is a complete algorithm, based on our theoretical contribution that the k-th best policy differs from the i-th policy, for some i驴k, on exactly one state. The second is an approximate algorithm that skips many less useful policies. We show that both algorithms have good scalability. We also show that the approximate algorithms runs much faster and finds interesting, high-quality policies.