Memoryless policies: theoretical limitations and practical results
SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
Acting optimally in partially observable stochastic domains
AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Adaptive Behavior
Scalable Internal-State Policy-Gradient Methods for POMDPs
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A POMDP formulation of preference elicitation problems
Eighteenth national conference on Artificial intelligence
Exact and approximate algorithms for partially observable markov decision processes
Exact and approximate algorithms for partially observable markov decision processes
Point-based value iteration: an anytime algorithm for POMDPs
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Solving POMDPs by searching the space of finite policies
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Learning finite-state controllers for partially observable environments
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes
UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence
An online POMDP algorithm for complex multiagent environments
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Exploiting belief bounds: practical POMDPs for personal assistant agents
Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Memory analysis and significance test for agent behaviours
Proceedings of the 8th annual conference on Genetic and evolutionary computation
An online multi-agent co-operative learning algorithm in POMDPs
Journal of Experimental & Theoretical Artificial Intelligence
Perseus: randomized point-based value iteration for POMDPs
Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs
Journal of Artificial Intelligence Research
Online planning algorithms for POMDPs
Journal of Artificial Intelligence Research
AEMS: an anytime online search algorithm for approximate policy refinement in large POMDPs
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs)
Proceedings of the 13th annual conference on Genetic and evolutionary computation
Analyzing and escaping local optima in planning as inference for partially observable domains
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Escaping local optima in POMDP planning as inference
The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
Real-Time decision making for large POMDPs
AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Isomorph-free branch and bound search for finite state controllers
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
The search for finite-state controllers for partially observable Markov decision processes (POMDPs) is often based on approaches like gradient ascent, attractive because of their relatively low computational cost. In this paper, we illustrate a basic problem with gradient-based methods applied to POMDPs, where the sequential nature of the decision problem is at issue, and propose a new stochastic local search method as an alternative. The heuristics used in our procedure mimic the sequential reasoning inherent in optimal dynamic programming (DP) approaches. We show that our algorithm consistently finds higher quality controllers than gradient ascent, and is competitive with (and, for some problems, superior to) other state-of-the-art controller and DP-based algorithms on large-scale POMDPs.