Memoryless policies: theoretical limitations and practical results
SAB94 Proceedings of the third international conference on Simulation of adaptive behavior : from animals to animats 3: from animals to animats 3
Acting optimally in partially observable stochastic domains
AAAI'94 Proceedings of the twelfth national conference on Artificial intelligence (vol. 2)
Adaptive Behavior
An improved policy iteration algorithm for partially observable MDPs
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Gradient descent for general reinforcement learning
Proceedings of the 1998 conference on Advances in neural information processing systems II
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Markov Decision Processes: Discrete Stochastic Dynamic Programming
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Learning to Predict by the Methods of Temporal Differences
Machine Learning
Learning Policies with External Memory
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Reinforcement learning with selective perception and hidden state
Reinforcement learning with selective perception and hidden state
Exact and approximate algorithms for partially observable markov decision processes
Exact and approximate algorithms for partially observable markov decision processes
Planning and control in stochastic domains with imperfect information
Planning and control in stochastic domains with imperfect information
Finite-memory control of partially observable systems
Finite-memory control of partially observable systems
Reinforcement learning: a survey
Journal of Artificial Intelligence Research
Planning and acting in partially observable stochastic domains
Artificial Intelligence
Solving POMDPs by searching the space of finite policies
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Solving POMDPs by searching in policy space
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Complexity of finite-horizon Markov decision process problems
Journal of the ACM (JACM)
Solving Partially Observable Problems by Evolution and Learning of Finite State Machines
ICES '01 Proceedings of the 4th International Conference on Evolvable Systems: From Biology to Hardware
Bounds on Sample Size for Policy Evaluation in Markov Environments
COLT '01/EuroCOLT '01 Proceedings of the 14th Annual Conference on Computational Learning Theory and and 5th European Conference on Computational Learning Theory
On the undecidability of probabilistic planning and related stochastic optimization problems
Artificial Intelligence - special issue on planning with uncertainty and incomplete information
Basic Ideas for Event-Based Optimization of Markov Systems
Discrete Event Dynamic Systems
Toward Optimal Classifier System Performance in Non-Markov Environments
Evolutionary Computation
Memory analysis and significance test for agent behaviours
Proceedings of the 8th annual conference on Genetic and evolutionary computation
Reinforcement Learning State Estimator
Neural Computation
Point-Based Value Iteration for Continuous POMDPs
The Journal of Machine Learning Research
A reinforcement learning framework for online data migration in hierarchical storage systems
The Journal of Supercomputing
Accelerated Neural Evolution through Cooperatively Coevolved Synapses
The Journal of Machine Learning Research
ECML '07 Proceedings of the 18th European conference on Machine Learning
A comparison between ATNoSFERES and Learning Classifier Systems on non-Markov problems
Information Sciences: an International Journal
Predictive representations for policy gradient in POMDPs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Stochastic local search for POMDP controllers
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Automatic synthesis of a global behavior from multiple distributed behaviors
AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Nonapproximability results for partially observable Markov decision processes
Journal of Artificial Intelligence Research
Finding approximate POMDP solutions through belief compression
Journal of Artificial Intelligence Research
Learning partially observable deterministic action models
Journal of Artificial Intelligence Research
Infinite-horizon policy-gradient estimation
Journal of Artificial Intelligence Research
Automatic synthesis of new behaviors from a library of available behaviors
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Solving POMDPs with continuous or large discrete observation spaces
IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Inference and Learning in Planning (Extended Abstract)
DS '09 Proceedings of the 12th International Conference on Discovery Science
RL-Based Memory Controller for Scalable Autonomous Systems
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
Conformant plans and beyond: Principles and complexity
Artificial Intelligence
An experimental comparison between ATNoSFERES and ACS
IWLCS'03-05 Proceedings of the 2003-2005 international conference on Learning classifier systems
Solving deep memory POMDPs with recurrent policy gradients
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Constraint-Based Controller Synthesis in Non-Deterministic and Partially Observable Domains
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
A Modified Memory-Based Reinforcement Learning Method for Solving POMDP Problems
Neural Processing Letters
Planning with incomplete information
MoChArt'10 Proceedings of the 6th international conference on Model checking and artificial intelligence
Solving POMDPs by searching the space of finite policies
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
The thing that we tried didn't work very well: deictic representation in reinforcement learning
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
PEGASUS: a policy search method for large MDPs and POMDPs
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Learning to cooperate via policy search
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Efficient non-linear control through neuroevolution
ECML'06 Proceedings of the 17th European conference on Machine Learning
Model-based online learning of POMDPs
ECML'05 Proceedings of the 16th European conference on Machine Learning
Online expectation maximization for reinforcement learning in POMDPs
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPS), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finite-state automata. In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finite-state automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finitestate controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each time-step.