An Uncertainty-Based Belief Selection Method for POMDP Value Iteration

Authors:
Qi Feng;Xuezhong Zhou;Houkuan Huang;Xiaoping Zhang
Affiliations:
School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China 100044;School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China 100044;School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China 100044;School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China 100044
Venue:
ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Year:
2009

Citing 7
Cited 0

Planning and acting in partially observable stochastic domains

Artificial Intelligence
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Forward search value iteration for POMDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Belief selection in point-based planning algorithms for POMDPs

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partially Observable Markov Decision Process (POMDP) provides a probabilistic model for decision making under uncertainty. Point-based value iteration algorithms are effective approximate algorithms to solve POMDP problems. Belief selection is a key step of point-based algorithm. In this paper we provide a belief selection method based on the uncertainty of belief point. The algorithm first computes the uncertainties of the belief points that could be reached, and then selects the belief points that have lower uncertainties and whose distances to the current belief set are larger than a threshold. The experimental results indicate that this method is effective to gain an approximate long-term discounted reward using fewer belief states than the other point-based algorithms.