A stochastic point-based algorithm for POMDPs

Authors:
François Laviolette;Ludovic Tobin
Affiliations:
Laval University, Computer Science Department, Quebec, Canada;Laval University, Computer Science Department, Quebec, Canada
Venue:
Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Year:
2008

Citing 4
Cited 0

Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
An online POMDP algorithm for complex multiagent environments

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
AEMS: an anytime online search algorithm for approximate policy refinement in large POMDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a new backup operator for point-based POMDP algorithms which performs a look-ahead search at depth greater than one. We apply this operator into a new algorithm, called Stochastic Search Value Iteration (SSVI). This new algorithm relies on stochastic exploration of the environment in order to update the value function. This is in opposition with existing POMDP point-based algorithms. The underlying ideas on which SSVI is based are very similar to temporal difference learning algorithms for MDPs. In particular, SSVI takes advantage of a soft-max action selection function and of the random character of the environment itself. Empirical results on usual benchmark problems show that our algorithm performs a bit better and a bit faster than HSVI2, the state of the art algorithm. This suggests that stochastic algorithms are an alternative for solving large POMDPs.