Heuristic search value iteration for POMDPs

Authors:
Trey Smith;Reid Simmons
Affiliations:
Carnegie Mellon University;Carnegie Mellon University
Venue:
UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Year:
2004

Citing 12
Cited 63

Learning in embedded systems

Learning in embedded systems
Learning to act using real-time dynamic programming

Artificial Intelligence - Special volume on computational research on interaction and agency, part 1
LAO: a heuristic search algorithm that finds solutions with loops

Artificial Intelligence - Special issue on heuristic search in artificial intelligence
BI-POMDP: Bounded, Incremental, Partially-Observable Markov-Model Planning

ECP '97 Proceedings of the 4th European Conference on Planning: Recent Advances in AI Planning
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
A heuristic variable grid solution method for POMDPs

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Incremental methods for computing bounds in partially observable Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
SPUDD: stochastic planning using decision diagrams

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Structured reachability analysis for Markov decision processes

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

An online POMDP algorithm for complex multiagent environments

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees

ICML '05 Proceedings of the 22nd international conference on Machine learning
Region-based value iteration for partially observable Markov decision processes

ICML '06 Proceedings of the 23rd international conference on Machine learning
Decentralized planning under uncertainty for teams of communicating agents

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Point-Based Value Iteration for Continuous POMDPs

The Journal of Machine Learning Research
The permutable POMDP: fast solutions to POMDPs for preference elicitation

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
Value-based observation compression for DEC-POMDPs

Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems - Volume 1
A dynamic decision network framework for online media adaptation in stroke rehabilitation

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)
Probabilistic planning with clear preferences on missing information

Artificial Intelligence
Achieving goals in decentralized POMDPs

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
An Uncertainty-Based Belief Selection Method for POMDP Value Iteration

ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Incremental least squares policy iteration for POMDPs

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Focused real-time dynamic programming for MDPs: squeezing more out of a heuristic

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Rover science autonomy: probabilistic planning for science-aware exploration doctoral consortium thesis summary

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 4
Point-based policy iteration

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Scaling up: solving POMDPs through value based clustering

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Scaling up: solving POMDPs through value based clustering

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Symbolic heuristic search value iteration for factored POMDPs

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs

Journal of Artificial Intelligence Research
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
AEMS: an anytime online search algorithm for approximate policy refinement in large POMDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Using core beliefs for point-based value iteration

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Modeling POMDPs for generating and simulating stock investment policies

Proceedings of the 2010 ACM Symposium on Applied Computing
Point-based planning for predictive state representations

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
A stochastic point-based algorithm for POMDPs

Canadian AI'08 Proceedings of the Canadian Society for computational studies of intelligence, 21st conference on Advances in artificial intelligence
Deterministic POMDPs revisited

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Planning under Uncertainty for Robotic Tasks with Mixed Observability

International Journal of Robotics Research
Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

Autonomous Agents and Multi-Agent Systems
Evaluating point-based POMDP solvers on multicore machines

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics - Special issue on gait analysis
Planning interventions in biological networks

ACM Transactions on Intelligent Systems and Technology (TIST)
Planning in partially-observable switching-mode continuous domains

Annals of Mathematics and Artificial Intelligence
A new graphical recursive pruning method for the incremental pruning algorithm

MICAI'10 Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I
POMDP filter: pruning POMDP value functions with the Kaczmarz iterative method

MICAI'10 Proceedings of the 9th Mexican international conference on Advances in artificial intelligence: Part I
Planning under the uncertainty of the technical analysis of stock markets

IBERAMIA'10 Proceedings of the 12th Ibero-American conference on Advances in artificial intelligence
Inverse Reinforcement Learning in Partially Observable Environments

The Journal of Machine Learning Research
A Bayesian Approach for Learning and Planning in Partially Observable Markov Decision Processes

The Journal of Machine Learning Research
Decentralized monitoring of distributed anytime algorithms

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 1
Episodic task learning in Markov decision processes

Artificial Intelligence Review
Aircraft Collision Avoidance Using Monte Carlo Real-Time Belief Space Search

Journal of Intelligent and Robotic Systems
Prioritizing point-based POMDP solvers

ECML'06 Proceedings of the 17th European conference on Machine Learning
An online POMDP algorithm used by the policeforce agents in the robocuprescue simulation

RoboCup 2005
Using rewards for belief state updates in partially observable markov decision processes

ECML'05 Proceedings of the 16th European conference on Machine Learning
Quantitative access control with partially-observable Markov decision processes

Proceedings of the second ACM conference on Data and Application Security and Privacy
Real-Time decision making for large POMDPs

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Exploiting symmetries for single- and multi-agent Partially Observable Stochastic Domains

Artificial Intelligence
Belief selection in point-based planning algorithms for POMDPs

AI'06 Proceedings of the 19th international conference on Advances in Artificial Intelligence: Canadian Society for Computational Studies of Intelligence
Evaluating POMDP rewards for active perception

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Estimation, planning, and mapping for autonomous flight using an RGB-D camera in GPS-denied environments

International Journal of Robotics Research
Observer effect from stateful resources in agent sensing

Autonomous Agents and Multi-Agent Systems
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems
Replanning in domains with partial information and sensing actions

Journal of Artificial Intelligence Research
Abstraction in Model Based Partially Observable Reinforcement Learning Using Extended Sequence Trees

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
Applying POMDP to moving target optimization

Proceedings of the Eighth Annual Cyber Security and Information Intelligence Research Workshop
A partially observable hybrid system model for bipedal locomotion for adapting to terrain variations

Proceedings of the 16th international conference on Hybrid systems: computation and control
Potential-based reward shaping for POMDPs

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
Planning for multiple measurement channels in a continuous-state POMDP

Annals of Mathematics and Artificial Intelligence
Optimally solving dec-POMDPs as continuous-state MDPs

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Bimodal switching for online planning in multiagent settings

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Run-time improvement of point-based POMDP policies

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scheduling sensors for monitoring sentient spaces using an approximate POMDP policy

Pervasive and Mobile Computing
MineralMiner: An active sensing simulation environment

Multiagent and Grid Systems
Point-based online value iteration algorithm in large POMDP

Applied Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel POMDP planning algorithm called heuristic search value iteration (HSVI). HSVI is an anytime algorithm that returns a policy and a provable bound on its regret with respect to the optimal policy. HSVI gets its power by combining two well-known techniques: attention-focusing search heuristics and piecewise linear convex representations of the value function. HSVI's soundness and convergence have been proven. On some bench-mark problems from the literature, HSVI displays speedups of greater than 100 with respect to other state-of-the-art POMDP value iteration algorithms. We also apply HSVI to a new rover exploration problem 10 times larger than most POMDP problems in the literature.