Anytime point-based approximations for large POMDPs

Authors:
Joelle Pineau;Geoffrey Gordon;Sebastian Thrun
Affiliations:
School of Computer Science, McGill University, Montréal, QC, Canada;Machine Learning Department, Carnegie Mellon University, Pittsburgh, PA;Computer Science Department, Stanford University, Stanford, CA
Venue:
Journal of Artificial Intelligence Research
Year:
2006

Citing 28
Cited 42

Planning for conjunctive goals

Artificial Intelligence
Computationally feasible bounds for partially observed Markov decision processes

Operations Research
A Survey of solution techniques for the partially observed Markov decision process

Annals of Operations Research
A survey of algorithmic methods for partially observed Markov decision processes

Annals of Operations Research
Fast planning through planning graph analysis

Artificial Intelligence
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Experiences with an interactive museum tour-guide robot

Artificial Intelligence - Special issue on applications of artificial intelligence
Robust Monte Carlo localization for mobile robots

Artificial Intelligence
Handbook of Industrial Robotics

Handbook of Industrial Robotics
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Value-Directed Belief State Approximation for POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Locating moving entities in indoor environments with teams of mobile robots

AAMAS '03 Proceedings of the second international joint conference on Autonomous agents and multiagent systems
Dynamic Programming

Dynamic Programming
Algorithms for partially observable markov decision processes

Algorithms for partially observable markov decision processes
Algorithms for sequential decision-making

Algorithms for sequential decision-making
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Distributed decision-making and task coordination in dynamic, uncertain and real-time multiagent environments

Distributed decision-making and task coordination in dynamic, uncertain and real-time multiagent environments
Stochastic local search for POMDP controllers

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
An improved grid-based approximation algorithm for POMDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
A heuristic variable grid solution method for POMDPs

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Incremental methods for computing bounds in partially observable Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Tractable inference for complex stochastic processes

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

A Near Optimal Policy for Channel Allocation in Cognitive Radio

Recent Advances in Reinforcement Learning
Introducing Communication in Dis-POMDPs with Locality of Interaction

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Graphical models for interactive POMDPs: representations and solutions

Autonomous Agents and Multi-Agent Systems
Factored temporal difference learning in the new ties environment

Acta Cybernetica
Network Distributed POMDP with Communication

New Frontiers in Artificial Intelligence
Improved approximation of interactive dynamic influence diagrams using discriminative model updates

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Anytime coordination using separable bilinear programs

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Generalized point based value iteration for interactive POMDPs

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Exploiting symmetries in POMDPs for point-based algorithms

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
Monte Carlo sampling methods for approximating interactive POMDPs

Journal of Artificial Intelligence Research
Optimal value of information in graphical models

Journal of Artificial Intelligence Research
Solving POMDPs: RTDP-bel vs. point-based algorithms

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Speeding up exact solutions of interactive dynamic influence diagrams using action equivalence

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
A POMDP approach to P300-based brain-computer interfaces

Proceedings of the 15th international conference on Intelligent user interfaces
Review article: Synergizing reinforcement learning and game theory-A new direction for control

Applied Soft Computing
Deterministic POMDPs revisited

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Point-based backup for decentralized POMDPs: complexity and new algorithms

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Introducing communication in Dis-POMDPs with locality of interaction

Web Intelligence and Agent Systems
Towards relational POMDPs for adaptive dialogue management

ACLstudent '10 Proceedings of the ACL 2010 Student Research Workshop
Flying in the dark: controlling autonomous data ferries with partial observations

Proceedings of the eleventh ACM international symposium on Mobile ad hoc networking and computing
Planning interventions in biological networks

ACM Transactions on Intelligent Systems and Technology (TIST)
Closing the learning-planning loop with predictive state representations

International Journal of Robotics Research
Inverse Reinforcement Learning in Partially Observable Environments

The Journal of Machine Learning Research
Identifying and exploiting weak-information inducing actions in solving POMDPs

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
HTN-style planning in relational POMDPs using first-order FSCs

KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
Quantitative access control with partially-observable Markov decision processes

Proceedings of the second ACM conference on Data and Application Security and Privacy
Exploiting symmetries for single- and multi-agent Partially Observable Stochastic Domains

Artificial Intelligence
Adaptive submodularity: theory and applications in active learning and stochastic optimization

Journal of Artificial Intelligence Research
Point-based value iteration for constrained POMDPs

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Generalized and bounded policy iteration for finitely-nested interactive POMDPs: scaling up

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Exploiting model equivalences for solving interactive dynamic influence diagrams

Journal of Artificial Intelligence Research
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems
A bayesian approach for constrained multi-agent minimum time search in uncertain dynamic domains

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Social signal and user adaptation in reinforcement learning-based dialogue management

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Decentralized multi-robot cooperation with auctioned POMDPs

International Journal of Robotics Research
Linear fitted-Q iteration with multiple reward functions

The Journal of Machine Learning Research
Planning for multiple measurement channels in a continuous-state POMDP

Annals of Mathematics and Artificial Intelligence
Run-time improvement of point-based POMDP policies

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scheduling sensors for monitoring sentient spaces using an approximate POMDP policy

Pervasive and Mobile Computing
A survey of multi-objective sequential decision-making

Journal of Artificial Intelligence Research
Point-based online value iteration algorithm in large POMDP

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.