Point-based policy iteration

Authors:
Shihao Ji;Ronald Parr;Hui Li;Xuejun Liao;Lawrence Carin
Affiliations:
Department of Electrical and Computer Engineering, Duke University, Durham, NC;Department of Computer Science, Duke University, Durham, NC and Department of Electrical and Computer Engineering;Department of Electrical and Computer Engineering, Duke University, Durham, NC;Department of Electrical and Computer Engineering, Duke University, Durham, NC;Department of Electrical and Computer Engineering, Duke University, Durham, NC
Venue:
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Year:
2007

Citing 8
Cited 8

Planning and acting in partially observable stochastic domains

Artificial Intelligence
Heuristic search value iteration for POMDPs

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Exploiting structure to efficiently solve large scale partially observable markov decision processes

Exploiting structure to efficiently solve large scale partially observable markov decision processes
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Solving POMDPs by searching the space of finite policies

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

A variance analysis for POMDP policy evaluation

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Policy iteration for decentralized control of Markov decision processes

Journal of Artificial Intelligence Research
Inverse reinforcement learning in partially observable environments

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
From dialogue management to pervasive interaction based assistive technology

Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments
Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs

Autonomous Agents and Multi-Agent Systems
Inverse Reinforcement Learning in Partially Observable Environments

The Journal of Machine Learning Research
Decision Support in Organizations: A Case for OrgPOMDPs

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 02
A survey of point-based POMDP solvers

Autonomous Agents and Multi-Agent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a point-based policy iteration (PBPI) algorithm for infinite-horizon POMDPs. PBPI replaces the exact policy improvement step of Hansen's policy iteration with point-based value iteration (PBVI). Despite being an approximate algorithm, PBPI is monotonic: At each iteration before convergence, PBPI produces a policy for which the values increase for at least one of a finite set of initial belief states, and decrease for none of these states. In contrast, PBVI cannot guarantee monotonic improvement of the value function or the policy. In practice PBPI generally needs a lower density of point coverage in the simplex and tends to produce superior policies with less computation. Experiments on several benchmark problems (up to 12,545 states) demonstrate the scalability and robustness of the PBPI algorithm.