Solving POMDPs by searching in policy space

Authors:
Eric A. Hansen
Affiliations:
Computer Science Department, University of Massachusetts, Amherst, MA
Venue:
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Year:
1998

Citing 11
Cited 45

Heuristic search in restricted memory (research note)

Artificial Intelligence
Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
An improved policy iteration algorithm for partially observable MDPs

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
BI-POMDP: Bounded, Incremental, Partially-Observable Markov-Model Planning

ECP '97 Proceedings of the 4th European Conference on Planning: Recent Advances in AI Planning
Incremental Markov-Model Planning

ICTAI '96 Proceedings of the 8th International Conference on Tools with Artificial Intelligence
Planning and Acting in Partially Observable Stochastic Domains

Planning and Acting in Partially Observable Stochastic Domains
Efficient dynamic-programming updates in partially observable Markov decision processes

Efficient dynamic-programming updates in partially observable Markov decision processes
Finite-memory control of partially observable systems

Finite-memory control of partially observable systems
A heuristic variable grid solution method for POMDPs

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Incremental methods for computing bounds in partially observable Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
Planning and Control in Artificial Intelligence: A Unifying Perspective

Applied Intelligence
The Complexity of Decentralized Control of Markov Decision Processes

Mathematics of Operations Research
Reactive Navigation Using Reinforment Learning in Situations of POMDPs

IWANN '01 Proceedings of the 6th International Work-Conference on Artificial and Natural Neural Networks: Bio-inspired Applications of Connectionism-Part II
A POMDP formulation of preference elicitation problems

Eighteenth national conference on Artificial intelligence
Learning diagnostic policies from examples by systematic search

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
An online POMDP algorithm for complex multiagent environments

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Heuristic anytime approaches to stochastic decision processes

Journal of Heuristics
Partially observable Markov decision processes for spoken dialog systems

Computer Speech and Language
Partially observable Markov decision processes with imprecise parameters

Artificial Intelligence
Model-free reinforcement learning as mixture learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Stochastic local search for POMDP controllers

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Dynamic programming for partially observable stochastic games

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Indefinite-horizon POMDPs with action-based termination

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Point-based policy iteration

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
A variance analysis for POMDP policy evaluation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Nonapproximability results for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Finding approximate POMDP solutions through belief compression

Journal of Artificial Intelligence Research
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Integrating learning from examples into the search for diagnostic policies

Journal of Artificial Intelligence Research
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
Policy iteration for decentralized control of Markov decision processes

Journal of Artificial Intelligence Research
Solving POMDPs using quadratically constrained linear programs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Complexity of probabilistic planning under average rewards

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
An improved grid-based approximation algorithm for POMDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Bounded policy iteration for decentralized POMDPs

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Conformant plans and beyond: Principles and complexity

Artificial Intelligence
Partially Observable Markov Decision Processes: A Geometric Technique and Analysis

Operations Research
A POMDP approximation algorithm that anticipates the need to observe

PRICAI'00 Proceedings of the 6th Pacific Rim international conference on Artificial intelligence
Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

Autonomous Agents and Multi-Agent Systems
A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Neural Processing Letters
HTN-style planning in relational POMDPs using first-order FSCs

KI'11 Proceedings of the 34th Annual German conference on Advances in artificial intelligence
My brain is full: when more memory helps

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching the space of finite policies

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Learning finite-state controllers for partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
The complexity of decentralized control of Markov decision processes

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
A POMDP model for guiding taxi cruising in a congested urban city

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
The Skyline algorithm for POMDP value function pruning

Annals of Mathematics and Artificial Intelligence
Generalized and bounded policy iteration for finitely-nested interactive POMDPs: scaling up

Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
On the Computational Complexity of Stochastic Controller Optimization in POMDPs

ACM Transactions on Computation Theory (TOCT)
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems
Producing efficient error-bounded solutions for transition independent decentralized mdps

Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
On the complexity of solving polytree-shaped limited memory influence diagrams with binary variables

Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most algorithms for solving POMDPs iteratively improve a value function that implicitly represents a policy and are said to search in value function space. This paper presents an approach to solving POMDPs that represents a policy explicitly as a finite-state controller and iteratively improves the controller by search in policy space. Two related algorithms illustrate this approach. The first is a policy iteration algorithm that can outperform value iteration in solving infinitehorizon POMDPs. It provides the foundation for a new heuristic search algorithm that promises further speedup by focusing computational effort on regions of the problem space that are reachable, or likely to be reached, from a start state.