Perseus: randomized point-based value iteration for POMDPs

Authors:
Matthijs T. J. Spaan;Nikos Vlassis
Affiliations:
Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands;Informatics Institute, University of Amsterdam, Amsterdam, The Netherlands
Venue:
Journal of Artificial Intelligence Research
Year:
2005

Citing 37
Cited 82

The complexity of Markov decision processes

Mathematics of Operations Research
Parallel and distributed computation: numerical methods

Parallel and distributed computation: numerical methods
Computationally feasible bounds for partially observed Markov decision processes

Operations Research
Prioritized Sweeping: Reinforcement Learning with Less Data and Less Time

Machine Learning
Artificial intelligence: a modern approach

Artificial intelligence: a modern approach
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Generalized prioritized sweeping

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
On the undecidability of probabilistic planning and infinite-horizon partially observable Markov decision problems

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Scalable Internal-State Policy-Gradient Methods for POMDPs

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Dynamic Programming

Dynamic Programming
Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms

Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
Algorithms for partially observable markov decision processes

Algorithms for partially observable markov decision processes
Finite-memory control of partially observable systems

Finite-memory control of partially observable systems
Least-squares policy iteration

The Journal of Machine Learning Research
Approximate Solutions for Partially Observable Stochastic Games with Common Payoffs

AAMAS '04 Proceedings of the Third International Joint Conference on Autonomous Agents and Multiagent Systems - Volume 1
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
An online POMDP algorithm for complex multiagent environments

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Exploiting structure to efficiently solve large scale partially observable markov decision processes

Exploiting structure to efficiently solve large scale partially observable markov decision processes
Stochastic local search for POMDP controllers

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Solving transition independent decentralized Markov decision processes

Journal of Artificial Intelligence Research
Finding approximate POMDP solutions through belief compression

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
An improved grid-based approximation algorithm for POMDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
A decision-theoretic approach to task assistance for persons with dementia

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Solving POMDPs with continuous or large discrete observation spaces

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
A heuristic variable grid solution method for POMDPs

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Solving POMDPs by searching the space of finite policies

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

An analytic solution to discrete Bayesian reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Decentralized planning under uncertainty for teams of communicating agents

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Integrating Value-Directed Compression and Belief Space Analysis for POMDP Decomposition

IAT '06 Proceedings of the IEEE/WIC/ACM international conference on Intelligent Agent Technology
Point-Based Value Iteration for Continuous POMDPs

The Journal of Machine Learning Research
Value-Directed Human Behavior Analysis from Video Using Partially Observable Markov Decision Processes

IEEE Transactions on Pattern Analysis and Machine Intelligence
A novel orthogonal NMF-based belief compression for POMDPs

Proceedings of the 24th international conference on Machine learning
Reasoning for a multi-modal service robot considering uncertainty in human-robot interaction

Proceedings of the 3rd ACM/IEEE international conference on Human robot interaction
The permutable POMDP: fast solutions to POMDPs for preference elicitation

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Value-based observation compression for DEC-POMDPs

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
A dynamic decision network framework for online media adaptation in stroke rehabilitation

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Real World Multi-agent Systems: Information Sharing, Coordination and Planning

Logic, Language, and Computation
A bayesian reinforcement learning approach for customizing human-robot interfaces

Proceedings of the 14th international conference on Intelligent user interfaces
A tractable hybrid ddn–pomdp approach to affective dialogue modeling for probabilistic frame-based dialogue systems

Natural Language Engineering
Applying POMDPs to dialog systems in the troubleshooting domain

NAACL-HLT-Dialog '07 Proceedings of the Workshop on Bridging the Gap: Academic and Industrial Research in Dialog Technologies
Constraint-based dynamic programming for decentralized POMDPs with structured interactions

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Improvement of the performance using received message on learning of communication codes

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
An Uncertainty-Based Belief Selection Method for POMDP Value Iteration

ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Decision making in assistive environments using multimodal observations

Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Design and prototype of a device to engage cognitively disabled older adults in visual artwork

Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Compact, convex upper bound iteration for approximate POMDP planning

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Improving approximate value iteration using memories and predictive state representations

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Point-based policy iteration

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Scaling up: solving POMDPs through value based clustering

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Scaling up: solving POMDPs through value based clustering

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Continuous state POMDPs for object manipulation tasks

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Generalized point based value iteration for interactive POMDPs

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Piecewise linear dynamic programming for constrained POMDPs

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 1
Symbolic heuristic search value iteration for factored POMDPs

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Optimal dialog in consumer-rating systems using a POMDP framework

SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
Anytime point-based approximations for large POMDPs

Journal of Artificial Intelligence Research
Optimal and approximate Q-value functions for decentralized POMDPs

Journal of Artificial Intelligence Research
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
Improving anytime point-based value iteration using principled point selections

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Solving POMDPs using quadratically constrained linear programs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
AEMS: an anytime online search algorithm for approximate policy refinement in large POMDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Forward search value iteration for POMDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Solving POMDPs: RTDP-bel vs. point-based algorithms

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Topological order planner for POMDPs

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Greedy algorithms for sequential sensing decisions

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
On Compressibility and Acceleration of Orthogonal NMF for POMDP Compression

ACML '09 Proceedings of the 1st Asian Conference on Machine Learning: Advances in Machine Learning
Bayesian reinforcement learning in continuous pomdps with Gaussian processes

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Improving POMDP tractability via belief compression and clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process

Computer Vision and Image Understanding
Point-based planning for predictive state representations

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Deterministic POMDPs revisited

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Point-based backup for decentralized POMDPs: complexity and new algorithms

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Risk-sensitive planning in partially observable environments

Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems: volume 1 - Volume 1
Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

Autonomous Agents and Multi-Agent Systems
Evaluating point-based POMDP solvers on multicore machines

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on gait analysis
A tool to promote prolonged engagement in art therapy: design and development from arts therapist requirements

Proceedings of the 12th international ACM SIGACCESS conference on Computers and accessibility
Efficient planning in large POMDPs through policy graph based factorized approximations

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Planning in partially-observable switching-mode continuous domains

Annals of Mathematics and Artificial Intelligence
A new graphical recursive pruning method for the incremental pruning algorithm

MICAI'10 Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I
A new pruning method for incremental pruning algorithm using a sweeping scan-line through the belief space

MICAI'10 Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I
POMDP filter: pruning POMDP value functions with the Kaczmarz iterative method

MICAI'10 Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I
Learning the behavior model of a robot

Autonomous Robots
POMDP solving: what rewards do you really expect at execution?

Proceedings of the 2010 conference on STAIRS 2010: Proceedings of the Fifth Starting AI Researchers' Symposium
Motion planning under uncertainty for robotic tasks with long time horizons

International Journal of Robotics Research
Accelerating point-based POMDP algorithms via greedy strategies

SIMPAR'10 Proceedings of the Second international conference on Simulation, modeling, and programming for autonomous robots
A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Neural Processing Letters
Rapid specification and automated generation of prompting systems to assist people with dementia

Pervasive and Mobile Computing
Closing the learning-planning loop with predictive state representations

International Journal of Robotics Research
Inverse Reinforcement Learning in Partially Observable Environments

The Journal of Machine Learning Research
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Goal-oriented sensor selection for intelligent phones: (GOSSIP)

Proceedings of the 2011 international workshop on Situation activity & goal awareness
Decision Support in Organizations: A Case for OrgPOMDPs

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Prioritizing point-based POMDP solvers

ECML'06 Proceedings of the 17th European conference on Machine Learning
Quantitative access control with partially-observable Markov decision processes

Proceedings of the second ACM conference on Data and Application Security and Privacy
Exploiting symmetries for single- and multi-agent Partially Observable Stochastic Domains

Artificial Intelligence
Belief selection in point-based planning algorithms for POMDPs

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
People, sensors, decisions: Customizable and adaptive technologies for assistance in healthcare

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special issue on highlights of the decade in interactive intelligent systems
Sequential selection of correlated ads by POMDPs

Proceedings of the 21st ACM international conference on Information and knowledge management
QueryPOMDP: POMDP-based communication in multiagent systems

EUMAS'11 Proceedings of the 9th European conference on Multi-Agent Systems
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems
Applying POMDP to moving target optimization

Proceedings of the Eighth Annual Cyber Security and Information Intelligence Research Workshop
Multiagent POMDPs with asynchronous execution

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Planning for multiple measurement channels in a continuous-state POMDP

Annals of Mathematics and Artificial Intelligence
Run-time improvement of point-based POMDP policies

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scheduling sensors for monitoring sentient spaces using an approximate POMDP policy

Pervasive and Mobile Computing
Survey Control: A perspective

Automatica (Journal of IFAC)
A survey of multi-objective sequential decision-making

Journal of Artificial Intelligence Research
Point-based online value iteration algorithm in large POMDP

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space. We present a randomized point-based value iteration algorithm called PERSEUS. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, PERSEUS backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of PERSEUS in large scale POMDP problems.