Solving POMDPs by searching in policy space

  • Authors:
  • Eric A. Hansen

  • Affiliations:
  • Computer Science Department, University of Massachusetts, Amherst, MA

  • Venue:
  • UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finite-state controller and iteratively improves the controller by search in policy space. Two related algorithms illustrate this approach. The first is a policy iteration algorithm that can outperform value iteration in solving infinitehorizon POMDPs. It provides the foundation for a new heuristic search algorithm that promises further speedup by focusing computational effort on regions of the problem space that are reachable, or likely to be reached, from a start state.