Using core beliefs for point-based value iteration

Authors:
Masoumeh T. Izadi;Ajit V. Rajwade;Doina Precup
Affiliations:
McGill University, School of Computer Science;University of Florida, CISE Department;McGill University, School of Computer Science
Venue:
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Year:
2005

Citing 3
Cited 6

Planning and acting in partially observable stochastic domains

Artificial Intelligence
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Improving approximate value iteration using memories and predictive state representations

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Improving anytime point-based value iteration using principled point selections

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using rewards for belief state updates in partially observable markov decision processes

ECML'05 Proceedings of the 16th European conference on Machine Learning
Sequential decision making under uncertainty

SARA'05 Proceedings of the 6th international conference on Abstraction, Reformulation and Approximation
Belief selection in point-based planning algorithms for POMDPs

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent research on point-based approximation algorithms for POMDPs demonstrated that good solutions to POMDP problems can be obtained without considering the entire belief simplex. For instance, the Point Based Value Iteration (PBVI) algorithm [Pineau et al., 2003] computes the value function only for a small set of belief states and iteratively adds more points to the set as needed. A key component of the algorithm is the strategy for selecting belief points, such that the space of reachable beliefs is well covered. This paper presents a new method for selecting an initial set of representative belief points, which relies on finding first the basis for the reachable belief simplex. Our approach has better worst-case performance than the original PBVI heuristic, and performs well in several standard POMDP tasks.