Space-Progressive Value Iteration: An Anytime Algorithm for a Class of POMDPs

  • Authors:
  • Nevin Lianwen Zhang;Weihong Zhang

  • Affiliations:
  • -;-

  • Venue:
  • ECSQARU '01 Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Finding optimal policies for general partially observable Markov decision processes (POMDPs) is computationally difficult primarily due to the need to perform dynamic-programming (DP) updates over the entire belief space. In this paper, we first study a somewhat restrictive class of special POMDPs called almost discernible POMDPs and propose an anytime algorithm called space-progressive value iteration(SPVI). SPVI does not perform DP updates over the entire belief space. Rather it restricts DP updates to a belief subspace that grows over time. It is argued that given sufficient time SPVI can find near-optimal policies for almostdiscernible POMDPs.We then show how SPVI can be applied to more a general class of POMDPs. Empirical results are presented to show the effectiveness of SPVI.