Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Co-evolving recurrent neurons learn deep memory POMDPs
GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Neural Computation
A learning algorithm for continually running fully recurrent neural networks
Neural Computation
Learning finite-state controllers for partially observable environments
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Anticipatory Behavior in Adaptive Learning Systems
Sequential constant size compressors for reinforcement learning
AGI'11 Proceedings of the 4th international conference on Artificial general intelligence
Hi-index | 0.00 |
We present Policy Gradient Actor-Critic (PGAC), a new model-free Reinforcement Learning (RL) method for creating limited-memory stochastic policiesfor Partially Observable Markov Decision Processes (POMDPs) that require long-term memories of past observations and actions. The approach involves estimating a policy gradient for an Actor through a Policy Gradient Critic which evaluates probability distributions on actions. Gradient-based updates of history-conditional action probability distributions enable the algorithm to learn a mapping from memory states (or event histories) to probability distributions on actions, solving POMDPs through a combination of memory and stochasticity. This goes beyond previous approaches to learning purely reactive POMDP policies, without giving up their advantages. Preliminary results on important benchmark tasks show that our approach can in principle be used as a general purpose POMDP algorithm that solves RL problems in both continuous and discrete action domains.