Natural gradient works efficiently in learning
Neural Computation
Gradient descent for general reinforcement learning
Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning
Introduction to Reinforcement Learning
Neuro-Dynamic Programming
Least-Squares Temporal Difference Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Conditional random fields for multi-agent reinforcement learning
Proceedings of the 24th international conference on Machine learning
Shaping multi-agent systems with gradient reinforcement learning
Autonomous Agents and Multi-Agent Systems
Neurocomputing
A semiparametric statistical approach to model-free policy evaluation
Proceedings of the 25th international conference on Machine learning
Efficiently learning linear-linear exponential family predictive representations of state
Proceedings of the 25th international conference on Machine learning
Policy Learning for Motor Skills
Neural Information Processing
Policy Gradients with Parameter-Based Exploration for Control
ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
A New Natural Policy Gradient by Stationary Distribution Metric
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
State-Dependent Exploration for Policy Gradient Methods
ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Policy Learning --- A Unified Perspective with Applications in Robotics
Recent Advances in Reinforcement Learning
A Collaborative Reinforcement Learning Approach to Urban Traffic Control Optimization
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
The factored policy-gradient planner
Artificial Intelligence
Exploiting locality of interactions using a policy-gradient approach in multiagent learning
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Efficient Sample Reuse in EM-Based Policy Search
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
2010 Special Issue: Parameter-exploring policy gradients
Neural Networks
Solving deep memory POMDPs with recurrent policy gradients
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Higher Coordination With Less Control-A Result of Information Maximization in the Sensorimotor Loop
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Taming the beast: guided self-organization of behavior in autonomous robots
SAB'10 Proceedings of the 11th international conference on Simulation of adaptive behavior: from animals to animats
Journal of Artificial Intelligence Research
Hessian matrix distribution for Bayesian policy gradient reinforcement learning
Information Sciences: an International Journal
ACM Transactions on Speech and Language Processing (TSLP)
Smart data structures: an online machine learning approach to multicore data structures
Proceedings of the 8th ACM international conference on Autonomic computing
Coordination of urban intersection agents based on multi-interaction history learning method
ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
A comparative study of reinforcement learning techniques on dialogue management
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Two-step gradient-based reinforcement learning for underwater robotics behavior learning
Robotics and Autonomous Systems
Hi-index | 0.00 |
This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.