Learning finite-state controllers for partially observable environments

Authors:
Nicolas Meuleau;Leonid Peshkin;Kee-Eung Kim;Leslie Pack Kaelbling
Affiliations:
Computer Science Dept, Brown University, Providence, RI;Computer Science Dept, Brown University, Providence, RI;Computer Science Dept, Brown University, Providence, RI;Computer Science Dept, Brown University, Providence, RI
Venue:
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Year:
1999

Citing 18
Cited 39

Memoryless policies: theoretical limitations and practical results

SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
Acting optimally in partially observable stochastic domains

AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
HQ-learning

Adaptive Behavior
An improved policy iteration algorithm for partially observable MDPs

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Gradient descent for general reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Markov Decision Processes: Discrete Stochastic Dynamic Programming

Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences

Machine Learning
Learning Policies with External Memory

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Reinforcement learning with selective perception and hidden state

Reinforcement learning with selective perception and hidden state
Exact and approximate algorithms for partially observable markov decision processes

Exact and approximate algorithms for partially observable markov decision processes
Planning and control in stochastic domains with imperfect information

Planning and control in stochastic domains with imperfect information
Finite-memory control of partially observable systems

Finite-memory control of partially observable systems
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains

Artificial Intelligence
Solving POMDPs by searching the space of finite policies

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Complexity of finite-horizon Markov decision process problems

Journal of the ACM (JACM)
Solving Partially Observable Problems by Evolution and Learning of Finite State Machines

ICES '01 Proceedings of the 4th International Conference on Evolvable Systems: From Biology to Hardware
Bounds on Sample Size for Policy Evaluation in Markov Environments

COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
The 2002 trading agent competition: an overview of agent strategies

AI Magazine
On the undecidability of probabilistic planning and related stochastic optimization problems

Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Basic Ideas for Event-Based Optimization of Markov Systems

Discrete Event Dynamic Systems
Toward Optimal Classifier System Performance in Non-Markov Environments

Evolutionary Computation
Memory analysis and significance test for agent behaviours

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Reinforcement Learning State Estimator

Neural Computation
Point-Based Value Iteration for Continuous POMDPs

The Journal of Machine Learning Research
Model-Based Reinforcement Learning for Partially Observable Games with Sampling-Based State Estimation

Neural Computation
A reinforcement learning framework for online data migration in hierarchical storage systems

The Journal of Supercomputing
Accelerated Neural Evolution through Cooperatively Coevolved Synapses

The Journal of Machine Learning Research
Policy Gradient Critics

ECML '07 Proceedings of the 18th European conference on Machine Learning
A comparison between ATNoSFERES and Learning Classifier Systems on non-Markov problems

Information Sciences: an International Journal
Predictive representations for policy gradient in POMDPs

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Stochastic local search for POMDP controllers

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Automatic synthesis of a global behavior from multiple distributed behaviors

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Nonapproximability results for partially observable Markov decision processes

Journal of Artificial Intelligence Research
Finding approximate POMDP solutions through belief compression

Journal of Artificial Intelligence Research
Learning partially observable deterministic action models

Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research
Automatic synthesis of new behaviors from a library of available behaviors

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Solving POMDPs with continuous or large discrete observation spaces

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Inference and Learning in Planning (Extended Abstract)

DS '09 Proceedings of the 12th International Conference on Discovery Science
RL-Based Memory Controller for Scalable Autonomous Systems

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Conformant plans and beyond: Principles and complexity

Artificial Intelligence
An experimental comparison between ATNoSFERES and ACS

IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
Solving deep memory POMDPs with recurrent policy gradients

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Constraint-Based Controller Synthesis in Non-Deterministic and Partially Observable Domains

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems

Neural Processing Letters
Planning with incomplete information

MoChArt'10 Proceedings of the 6th international conference on Model checking and artificial intelligence
Solving POMDPs by searching the space of finite policies

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
The thing that we tried didn't work very well: deictic representation in reinforcement learning

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
PEGASUS: a policy search method for large MDPs and POMDPs

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Learning to cooperate via policy search

UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Efficient non-linear control through neuroevolution

ECML'06 Proceedings of the 17th European conference on Machine Learning
Model-based online learning of POMDPs

ECML'05 Proceedings of the 16th European conference on Machine Learning
Online expectation maximization for reinforcement learning in POMDPs

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPS), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finite-state automata. In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finite-state automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finitestate controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each time-step.