Finding Best k Policies

  • Authors:
  • Peng Dai;Judy Goldsmith

  • Affiliations:
  • Computer Science & Engineering University of Washington, Seattle 98195-2350;Dept. of Comp. Sci. Lexington, Univ. of Kentucky, USA 40506-0046

  • Venue:
  • ADT '09 Proceedings of the 1st International Conference on Algorithmic Decision Theory
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

An optimal probabilistic-planning algorithm solves a problem, usually modeled by a Markov decision process, by finding its optimal policy. In this paper, we study the k best policies problem. The problem is to find the k best policies. The k best policies, k 1, cannot be found directly using dynamic programming. Naïvely, finding the k -th best policy can be Turing reduced to the optimal planning problem, but the number of problems queried in the naïve algorithm is exponential in k . We show empirically that solving k best policy problem by using this reduction requires unreasonable amounts of time even when k = 3. We then provide a new algorithm, based on our theoretical contribution to prove that the k -th best policy differs from the i -th policy, for some i k , on exactly one state. We show that the time complexity of the algorithm is quadratic in k , but the number of optimal planning problems it solves is linear in k . We demonstrate empirically that the new algorithm has good scalability.