An improved policy iteration algorithm for partially observable MDPs
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Exact and approximate algorithms for partially observable markov decision processes
Exact and approximate algorithms for partially observable markov decision processes
Tractable planning under uncertainty: exploiting structure
Tractable planning under uncertainty: exploiting structure
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Exploiting structure to efficiently solve large scale partially observable markov decision processes
Probabilistic inference for solving discrete and continuous state Markov Decision Processes
ICML '06 Proceedings of the 23rd international conference on Machine learning
Model-free reinforcement learning as mixture learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Stochastic local search for POMDP controllers
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Solving POMDPs using quadratically constrained linear programs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
New inference strategies for solving Markov decision processes using reversible jump MCMC
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Solving POMDPs by searching the space of finite policies
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Isomorph-free branch and bound search for finite state controllers
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Planning as inference recently emerged as a versatile approach to decision-theoretic planning and reinforcement learning for single and multi-agent systems in fully and partially observable domains with discrete and continuous variables. Since planning as inference essentially tackles a non-convex optimization problem when the states are partially observable, there is a need to develop techniques that can robustly escape local optima. We investigate the local optima of finite state controllers in single agent partially observable Markov decision processes (POMDPs) that are optimized by expectation maximization (EM). We show that EM converges to controllers that are optimal with respect to a one-step lookahead. To escape local optima, we propose two algorithms: the first one adds nodes to the controller to ensure optimality with respect to a multi-step lookahead, while the second one splits nodes in a greedy fashion to improve reward likelihood. The approaches are demonstrated empirically on benchmark problems.