Region-based value iteration for partially observable Markov decision processes

Authors:
Hui Li;Xuejun Liao;Lawrence Carin
Affiliations:
Duke University, Durham, NC;Duke University, Durham, NC;Duke University, Durham, NC
Venue:
ICML '06 Proceedings of the 23rd international conference on Machine learning
Year:
2006

Citing 6
Cited 1

Computationally feasible bounds for partially observed Markov decision processes

Operations Research
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Algorithms for partially observable markov decision processes

Algorithms for partially observable markov decision processes
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A heuristic variable grid solution method for POMDPs

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence

Quadratically gated mixture of experts for incomplete data classification

Proceedings of the 24th international conference on Machine learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

An approximate region-based value iteration (RBVI) algorithm is proposed to find the optimal policy for a partially observable Markov decision process (POMDP). The proposed RBVI approximates the true polyhedral partition of the belief simplex with an ellipsoidal partition, such that the optimal value function is linear in each of the ellipsoidal regions. The position and shape of each region, as well as the gradient (alpha-vector) of the optimal value function in the region, are parameterized explicitly, and are estimated via efficient expectation maximization (EM) and variational Bayesian EM (VBEM), based on a set of selected sample belief points. The RBVI maintains a much smaller number of alpha-vectors than point-based methods and yields a more parsimonious representation that approximates the true value function in the maximum likelihood (ML) sense. The results on benchmark problems show that the proposed RBVI is comparable in performance to state-of-the-art algorithms, despite of the small number of alpha-vectors that are used.