Stochastic local search for POMDP controllers

Authors:
Darius Braziunas;Craig Boutilier
Affiliations:
Department of Computer Science, University of Toronto, Toronto, ON;Department of Computer Science, University of Toronto, Toronto, ON
Venue:
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Year:
2004

Citing 11
Cited 15

Memoryless policies: theoretical limitations and practical results

SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
HQ-learning

Adaptive Behavior
Scalable Internal-State Policy-Gradient Methods for POMDPs

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A POMDP formulation of preference elicitation problems

Eighteenth national conference on Artificial intelligence
Exact and approximate algorithms for partially observable markov decision processes

Exact and approximate algorithms for partially observable markov decision processes
Point-based value iteration: an anytime algorithm for POMDPs

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Solving POMDPs by searching the space of finite policies

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Learning finite-state controllers for partially observable environments

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Incremental pruning: a simple, fast, exact method for partially observable Markov decision processes

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

An online POMDP algorithm for complex multiagent environments

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Exploiting belief bounds: practical POMDPs for personal assistant agents

Proceedings of the fourth international joint conference on Autonomous agents and multiagent systems
Memory analysis and significance test for agent behaviours

Proceedings of the 8th annual conference on Genetic and evolutionary computation
An online multi-agent co-operative learning algorithm in POMDPs

Journal of Experimental & Theoretical Artificial Intelligence
Perseus: randomized point-based value iteration for POMDPs

Journal of Artificial Intelligence Research
Anytime point-based approximations for large POMDPs

Journal of Artificial Intelligence Research
Online planning algorithms for POMDPs

Journal of Artificial Intelligence Research
AEMS: an anytime online search algorithm for approximate policy refinement in large POMDPs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Towards efficient computation of error bounded solutions in POMDPs: expected value approximation and dynamic disjunctive beliefs

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs)

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Analyzing and escaping local optima in planning as inference for partially observable domains

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Escaping local optima in POMDP planning as inference

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 3
An online POMDP algorithm used by the policeforce agents in the robocuprescue simulation

RoboCup 2005
Real-Time decision making for large POMDPs

AI'05 Proceedings of the 18th Canadian Society conference on Advances in Artificial Intelligence
Isomorph-free branch and bound search for finite state controllers

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The search for finite-state controllers for partially observable Markov decision processes (POMDPs) is often based on approaches like gradient ascent, attractive because of their relatively low computational cost. In this paper, we illustrate a basic problem with gradient-based methods applied to POMDPs, where the sequential nature of the decision problem is at issue, and propose a new stochastic local search method as an alternative. The heuristics used in our procedure mimic the sequential reasoning inherent in optimal dynamic programming (DP) approaches. We show that our algorithm consistently finds higher quality controllers than gradient ascent, and is competitive with (and, for some problems, superior to) other state-of-the-art controller and DP-based algorithms on large-scale POMDPs.