Evolution and Optimum Seeking: The Sixth Generation
Evolution and Optimum Seeking: The Sixth Generation
Reinforcement Learning in POMDP's via Direct Gradient Ascent
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Completely Derandomized Self-Adaptation in Evolution Strategies
Evolutionary Computation
Making a Robot Learn to Play Soccer Using Reward and Punishment
KI '07 Proceedings of the 30th annual German conference on Advances in Artificial Intelligence
ECML'05 Proceedings of the 16th European conference on Machine Learning
Anticipatory Behavior in Adaptive Learning Systems
Policy gradients for cryptanalysis
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Multi-dimensional deep memory Atari-go players for parameter exploring policy gradients
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
Hi-index | 0.00 |
We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than those obtained by policy gradient methods such as REINFORCE. For several complex control tasks, including robust standing with a humanoid robot, we show that our method outperforms well-known algorithms from the fields of policy gradients, finite difference methods and population based heuristics. We also provide a detailed analysis of the differences between our method and the other algorithms.