Forward search value iteration for POMDPs

Authors:
Guy Shani;Ronen I. Brafman;Solomon E. Shimony
Affiliations:
Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel;Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel;Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel
Venue:
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Year:
2007

Citing 4
Cited 20

Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
An improved grid-based approximation algorithm for POMDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1

Reinforcement learning with limited reinforcement: using Bayes risk for active learning in POMDPs

Proceedings of the 25th international conference on Machine learning
The permutable POMDP: fast solutions to POMDPs for preference elicitation

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Probabilistic planning with clear preferences on missing information

Artificial Intelligence
Model-free reinforcement learning as mixture learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Constraint-based dynamic programming for decentralized POMDPs with structured interactions

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
An Uncertainty-Based Belief Selection Method for POMDP Value Iteration

ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Solving POMDPs: RTDP-bel vs. point-based algorithms

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Topological order planner for POMDPs

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Human-robot collaboration for a shared mission

Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction
Evaluating point-based POMDP solvers on multicore machines

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on gait analysis
Planning in partially-observable switching-mode continuous domains

Annals of Mathematics and Artificial Intelligence
Motion planning under uncertainty for robotic tasks with long time horizons

International Journal of Robotics Research
Accelerating point-based POMDP algorithms via greedy strategies

SIMPAR'10 Proceedings of the Second international conference on Simulation, modeling, and programming for autonomous robots
Decision Support in Organizations: A Case for OrgPOMDPs

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Exploiting symmetries for single- and multi-agent Partially Observable Stochastic Domains

Artificial Intelligence
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Artificial Intelligence
The Skyline algorithm for POMDP value function pruning

Annals of Mathematics and Artificial Intelligence
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems
Scheduling sensors for monitoring sentient spaces using an approximate POMDP policy

Pervasive and Mobile Computing
Point-based online value iteration algorithm in large POMDP

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent scaling up of POMDP solvers towards realistic applications is largely due to point-based methods which quickly converge to an approximate solution formedium-sized problems. Of this family HSVI, which uses trial-based asynchronous value iteration, can handle the largest domains. In this paper we suggest a new algorithm, FSVI, that uses the underlying MDP to traverse the belief space towards rewards, finding sequences of useful backups, and show how it scales up better than HSVI on larger benchmarks.