Improving POMDP tractability via belief compression and clustering

  • Authors:
  • Xin Li;William K. Cheung;Jiming Liu

  • Affiliations:
  • Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong;Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong;Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong

  • Venue:
  • IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Partially observable Markov decision process (POMDP) is a commonly adopted mathematical framework for solving planning problems in stochastic environments. However, computing the optimal policy of POMDP for large-scale problems is known to be intractable, where the high dimensionality of the underlying belief space is one of the major causes. In this paper, we propose a hybrid approach that integrates two different approaches for reducing the dimensionality of the belief space: 1) belief compression and 2) value-directed compression. In particular, a novel orthogonal nonnegative matrix factorization is derived for the belief compression, which is then integrated in a value-directed framework for computing the policy. In addition, with the conjecture that a properly partitioned belief space can have its per-cluster intrinsic dimension further reduced, we propose to apply a k-means-like clustering technique to partition the belief space to form a set of sub-POMDPs before applying the dimension reduction techniques to each of them. We have evaluated the proposed belief compression and clustering approaches based on a set of benchmark problems and demonstrated their effectiveness in reducing the cost for computing policies, with the quality of the policies being retained.