Planning and acting in partially observable stochastic domains
Artificial Intelligence
Heuristic search value iteration for POMDPs
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Point-based value iteration: an anytime algorithm for POMDPs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Improving approximate value iteration using memories and predictive state representations
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Improving anytime point-based value iteration using principled point selections
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using rewards for belief state updates in partially observable markov decision processes
ECML'05 Proceedings of the 16th European conference on Machine Learning
Sequential decision making under uncertainty
SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Belief selection in point-based planning algorithms for POMDPs
AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
A survey of point-based POMDP solvers
Autonomous Agents and Multi-Agent Systems
Hi-index | 0.00 |
Recent research on point-based approximation algorithms for POMDPs demonstrated that good solutions to POMDP problems can be obtained without considering the entire belief simplex. For instance, the Point Based Value Iteration (PBVI) algorithm [Pineau et al., 2003] computes the value function only for a small set of belief states and iteratively adds more points to the set as needed. A key component of the algorithm is the strategy for selecting belief points, such that the space of reachable beliefs is well covered. This paper presents a new method for selecting an initial set of representative belief points, which relies on finding first the basis for the reachable belief simplex. Our approach has better worst-case performance than the original PBVI heuristic, and performs well in several standard POMDP tasks.