Heuristic search value iteration for POMDPs

  • Authors:
  • Trey Smith;Reid Simmons

  • Affiliations:
  • Carnegie Mellon University;Carnegie Mellon University

  • Venue:
  • UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI's soundness and convergence have been proven. On some bench-mark problems from the literature, HSVI displays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature.