Decomposing Large-Scale POMDP Via Belief State Analysis

  • Authors:
  • Xin Li;William K. Cheung;Jiming Liu

  • Affiliations:
  • Department of Computer Science Hong Kong Baptist University;Department of Computer Science Hong Kong Baptist University;Department of Computer Science Hong Kong Baptist University

  • Venue:
  • IAT '05 Proceedings of the IEEE/WIC/ACM International Conference on Intelligent Agent Technology
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Partially observable Markov decision process (POMDP) is commonly used to model a stochastic environment with unobservable states for supporting optimal decision making. Computing the optimal policy for a large-scale POMDP is known to be intractable. Belief compression, being an approximate solution, has recently been proposed to reduce the dimension of POMDPýs belief state space and shown to be effective in improving the problem tractability. In this paper, with the conjecture that temporally close belief states could be characterized by a lower intrinsic dimension, we propose a spatio-temporal brief clustering that considers both the belief statesý spatial (in the belief space) and temporal similarities, as well as incorporate it into the belief compression algorithm. The proposed clustering results in belief state clusters as sub-POMDPs of much lower dimension so as to be distributed to a set of distributed agents for collaborative problem solving. The proposed method has been tested using a synthesized navigation problem (Hallway2) and empirically shown to be able to result in policies of superior long-term rewards when compared with those based on solely belief compression. Some future research directions for extending this belief state analysis approach are also included.