Point-Based Value Iteration for Continuous POMDPs

Authors:
Josep M. Porta;Nikos Vlassis;Matthijs T.J. Spaan;Pascal Poupart
Affiliations:
-;-;-;-
Venue:
The Journal of Machine Learning Research
Year:
2006

Citing 34
Cited 18

The complexity of Markov decision processes

Mathematics of Operations Research
Reinforcement learning of non-Markov decision processes

Artificial Intelligence - Special volume on computational research on interaction and agency, part 2
Planning and acting in partially observable stochastic domains

Artificial Intelligence
CONDENSATION—Conditional Density Propagation forVisual Tracking

International Journal of Computer Vision
On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Dynamic Programming and Optimal Control

Dynamic Programming and Optimal Control
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Scalable Internal-State Policy-Gradient Methods for POMDPs

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Planning and Acting under Uncertainty: A New Model for Spoken Dialogue System

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
A POMDP formulation of preference elicitation problems

Eighteenth national conference on Artificial intelligence
Experiences with a mobile robotic guide for the elderly

Eighteenth national conference on Artificial intelligence
Active Gesture Recognition Using Partially Observable markov Decision Processes

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276 - Volume 7276
Dynamic Programming

Dynamic Programming
Algorithms for partially observable markov decision processes

Algorithms for partially observable markov decision processes
Optimal learning: computational procedures for bayes-adaptive markov decision processes

Optimal learning: computational procedures for bayes-adaptive markov decision processes
Finding approximate pomdp solutions through belief compression

Finding approximate pomdp solutions through belief compression
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Spoken dialogue management using probabilistic reasoning

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Nonapproximability results for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Finding approximate POMDP solutions through belief compression

Journal of Artificial Intelligence Research
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A decision-theoretic approach to task assistance for persons with dementia

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Solving POMDPs with continuous or large discrete observation spaces

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Probabilistic robot navigation in partially observable environments

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Computing optimal policies for partially observable decision processes using compact representations

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Learning finite-state controllers for partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Trajectory Optimization using Reinforcement Learning for Map Exploration

International Journal of Robotics Research
Spoken language interaction with model uncertainty: an adaptive human-robot interaction system

Connection Science - Language and Robots
A POMDP framework for coordinated guidance of autonomous UAVs for multitarget tracking

EURASIP Journal on Advances in Signal Processing - Special issue on signal processing advances in robots and autonomy
Near-optimal search in continuous domains

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Continuous state POMDPs for object manipulation tasks

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Probabilistic action planning for active scene modeling in continuous high-dimensional domains

ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Optimal design of sequential real-time communication systems

IEEE Transactions on Information Theory
Bayesian reinforcement learning in continuous pomdps with Gaussian processes

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Improving POMDP tractability via belief compression and clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Risk-sensitive planning in partially observable environments

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Planning in partially-observable switching-mode continuous domains

Annals of Mathematics and Artificial Intelligence
LQG-MP: Optimized path planning for robots with motion uncertainty and imperfect state information

International Journal of Robotics Research
Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs)

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Efficient planning under uncertainty with macro-actions

Journal of Artificial Intelligence Research
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs

Artificial Intelligence
Motion planning under uncertainty using iterative local optimization in belief space

International Journal of Robotics Research
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems
Planning for multiple measurement channels in a continuous-state POMDP

Annals of Mathematics and Artificial Intelligence

Quantified Score

Hi-index	0.06

Visualization

Abstract

We propose a novel approach to optimize Partially Observable Markov Decisions Processes (POMDPs) defined on continuous spaces. To date, most algorithms for model-based POMDPs are restricted to discrete states, actions, and observations, but many real-world problems such as, for instance, robot navigation, are naturally defined on continuous spaces. In this work, we demonstrate that the value function for continuous POMDPs is convex in the beliefs over continuous state spaces, and piecewise-linear convex for the particular case of discrete observations and actions but still continuous states. We also demonstrate that continuous Bellman backups are contracting and isotonic ensuring the monotonic convergence of value-iteration algorithms. Relying on those properties, we extend the algorithm, originally developed for discrete POMDPs, to work in continuous state spaces by representing the observation, transition, and reward models using Gaussian mixtures, and the beliefs using Gaussian mixtures or particle sets. With these representations, the integrals that appear in the Bellman backup can be computed in closed form and, therefore, the algorithm is computationally feasible. Finally, we further extend to deal with continuous action and observation sets by designing effective sampling approaches.