Neural Computation
Experiments with infinite-horizon, policy-gradient estimation
Journal of Artificial Intelligence Research
Learning finite-state controllers for partially observable environments
UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
ECML'05 Proceedings of the 16th European conference on Machine Learning
Learning to trade via direct reinforcement
IEEE Transactions on Neural Networks
Accelerated Neural Evolution through Cooperatively Coevolved Synapses
The Journal of Machine Learning Research
Episodic Reinforcement Learning by Logistic Reward-Weighted Regression
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
State-Dependent Exploration for Policy Gradient Methods
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Neuroevolution strategies for episodic reinforcement learning
Journal of Algorithms
Evolving Memory Cell Structures for Sequence Learning
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
The neuronal replicator hypothesis
Neural Computation
ACM Transactions on Speech and Language Processing (TSLP)
Observer effect from stateful resources in agent sensing
Autonomous Agents and Multi-Agent Systems
MineralMiner: An active sensing simulation environment
Multiagent and Grid Systems
Hi-index | 0.00 |
This paper presents Recurrent Policy Gradients, a modelfree reinforcement learning (RL) method creating limited-memory stochastic policies for partially observable Markov decision problems (POMDPs) that require long-term memories of past observations. The approach involves approximating a policy gradient for a Recurrent Neural Network (RNN) by backpropagating return-weighted characteristic eligibilities through time. Using a "Long Short-Term Memory" architecture, we are able to outperform other RL methods on two important benchmark tasks. Furthermore, we show promising results on a complex car driving simulation task.