Compact, convex upper bound iteration for approximate POMDP planning

Authors:
Tao Wang;Pascal Poupart;Michael Bowling;Dale Schuurmans
Affiliations:
University of Alberta;University of Waterloo;University of Alberta;University of Alberta
Venue:
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Year:
2006

Citing 20
Cited 1

Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
Dynamic Programming and Optimal Control, Two Volume Set

Dynamic Programming and Optimal Control, Two Volume Set
A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes

Machine Learning
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
An epsilon-Optimal Grid-Based Algorithm for Partially Observable Markov Decision Processes

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
On the undecidability of probabilistic planning and related stochastic optimization problems

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Convex Optimization

Convex Optimization
Exploiting structure to efficiently solve large scale partially observable markov decision processes

Exploiting structure to efficiently solve large scale partially observable markov decision processes
Solving POMDPs using quadratically constrained linear programs

AAMAS '06 Proceedings of the fifth international joint conference on Autonomous agents and multiagent systems
Value-function approximations for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Speeding up the convergence of value iteration in partially observable Markov decision processes

Journal of Artificial Intelligence Research
Finding approximate POMDP solutions through belief compression

Journal of Artificial Intelligence Research
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
An improved grid-based approximation algorithm for POMDPs

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
A decision-theoretic approach to task assistance for persons with dementia

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Approximating optimal policies for partially observable stochastic domains

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Computing optimal policies for partially observable decision processes using compact representations

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 2
Incremental methods for computing bounds in partially observable Markov decision processes

AAAI'97/IAAI'97 Proceedings of the fourteenth national conference on artificial intelligence and ninth conference on Innovative applications of artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Linear fitted-Q iteration with multiple reward functions

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partially observable Markov decision processes (POMDPs) are an intuitive and general way to model sequential decision making problems under uncertainty. Unfortunately, even approximate planning in POMDPs is known to be hard, and developing heuristic planners that can deliver reasonable results in practice has proved to be a significant challenge. In this paper, we present a new approach to approximate value-iteration for POMDP planning that is based on quadratic rather than piecewise linear function approximators. Specifically, we approximate the optimal value function by a convex upper bound composed of a fixed number of quadratics, and optimize it at each stage by semidefinite programming. We demonstrate that our approach can achieve competitive approximation quality to current techniques while still maintaining a bounded size representation of the function approximator. Moreover, an upper bound on the optimal value function can be preserved if required. Overall, the technique requires computation time and space that is only linear in the number of iterations (horizon time).