Computationally feasible bounds for partially observed Markov decision processes
Operations Research
Reinforcement learning with replacing eligibility traces
Machine Learning - Special issue on reinforcement learning
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
A POMDP formulation of preference elicitation problems
Eighteenth national conference on Artificial intelligence
Algorithms for sequential decision-making
Algorithms for sequential decision-making
Heuristic search value iteration for POMDPs
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Prioritization Methods for Accelerating MDP Solvers
The Journal of Machine Learning Research
An MDP-Based Recommender System
The Journal of Machine Learning Research
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Partially observable Markov decision processes for spoken dialog systems
Computer Speech and Language
Point-Based Value Iteration for Continuous POMDPs
The Journal of Machine Learning Research
The permutable POMDP: fast solutions to POMDPs for preference elicitation
Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
An instance-based state representation for network repair
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Indefinite-horizon POMDPs with action-based termination
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Scaling up: solving POMDPs through value based clustering
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Symbolic heuristic search value iteration for factored POMDPs
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Value-function approximations for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes
Journal of Artificial Intelligence Research
Perseus: randomized point-based value iteration for POMDPs
Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs
Journal of Artificial Intelligence Research
Topological value iteration algorithm for Markov decision processes
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
AEMS: an anytime online search algorithm for approximate policy refinement in large POMDPs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Forward search value iteration for POMDPs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Point-based value iteration: an anytime algorithm for POMDPs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Using core beliefs for point-based value iteration
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Learning to act using real-time dynamic programming
Artificial Intelligence
A translation-based approach to contingent planning
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Solving POMDPs: RTDP-bel vs. point-based algorithms
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Topological order planner for POMDPs
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
icLQG: combining local and global optimization for control in information space
ICRA'09 Proceedings of the 2009 IEEE international conference on Robotics and Automation
Computer Vision and Image Understanding
Evaluating point-based POMDP solvers on multicore machines
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on gait analysis
Incremental methods for computing bounds in partially observable Markov decision processes
AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Solving POMDPs by searching in policy space
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
Belief selection in point-based planning algorithms for POMDPs
AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Prioritizing Point-Based POMDP Solvers
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Optimally solving dec-POMDPs as continuous-state MDPs
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Automatica (Journal of IFAC)
Hi-index | 0.00 |
The past decade has seen a significant breakthrough in research on solving partially observable Markov decision processes (POMDPs). Where past solvers could not scale beyond perhaps a dozen states, modern solvers can handle complex domains with many thousands of states. This breakthrough was mainly due to the idea of restricting value function computations to a finite subset of the belief space, permitting only local value updates for this subset. This approach, known as point-based value iteration, avoids the exponential growth of the value function, and is thus applicable for domains with longer horizons, even with relatively large state spaces. Many extensions were suggested to this basic idea, focusing on various aspects of the algorithm--mainly the selection of the belief space subset, and the order of value function updates. In this survey, we walk the reader through the fundamentals of point-based value iteration, explaining the main concepts and ideas. Then, we survey the major extensions to the basic algorithm, discussing their merits. Finally, we include an extensive empirical analysis using well known benchmarks, in order to shed light on the strengths and limitations of the various approaches.