An improved policy iteration algorithm for partially observable MDPs
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
An n log n algorithm for minimizing states in a finite automaton
An n log n algorithm for minimizing states in a finite automaton
Stochastic local search for POMDP controllers
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Value-function approximations for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Solving POMDPs with continuous or large discrete observation spaces
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs
Autonomous Agents and Multi-Agent Systems
Analyzing and escaping local optima in planning as inference for partially observable domains
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Solving POMDPs by searching the space of finite policies
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
On the Computational Complexity of Stochastic Controller Optimization in POMDPs
ACM Transactions on Computation Theory (TOCT)
Hi-index | 0.00 |
The recent proliferation of smart-phones and other wearable devices has lead to a surge of new mobile applications. Partially observable Markov decision processes provide a natural framework to design applications that continuously make decisions based on noisy sensor measurements. However, given the limited battery life, there is a need to minimize the amount of online computation. This can be achieved by compiling a policy into a finite state controller since there is no need for belief monitoring or online search. In this paper, we propose a new branch and bound technique to search for a good controller. In contrast to many existing algorithms for controllers, our search technique is not subject to local optima. We also show how to reduce the amount of search by avoiding the enumeration of isomorphic controllers and by taking advantage of suitable upper and lower bounds. The approach is demonstrated on several benchmark problems as well as a smart-phone application to assist persons with Alzheimer's to wayfind.