Natural gradient works efficiently in learning
Neural Computation
Gradient descent for general reinforcement learning
Proceedings of the 1998 conference on Advances in neural information processing systems II
Reinforcement Learning
Neuro-Dynamic Programming
Least-Squares Temporal Difference Learning
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An introduction to reinforcement learning theory: value function methods
Advanced lectures on machine learning
Conditional random fields for multi-agent reinforcement learning
Proceedings of the 24th international conference on Machine learning
Shaping multi-agent systems with gradient reinforcement learning
Autonomous Agents and Multi-Agent Systems
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm
CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
ECML'05 Proceedings of the 16th European conference on Machine Learning
Evolution Strategies for Direct Policy Search
Proceedings of the 10th international conference on Parallel Problem Solving from Nature: PPSN X
Basis Expansion in Natural Actor Critic Methods
Recent Advances in Reinforcement Learning
Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem
Recent Advances in Reinforcement Learning
Gaussian process dynamic programming
Neurocomputing
A survey of robot learning from demonstration
Robotics and Autonomous Systems
Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Stochastic search using the natural gradient
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
An empirical analysis of value function-based and policy search reinforcement learning
Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Efficient natural evolution strategies
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Uncertainty handling CMA-ES for reinforcement learning
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Natural actor-critic algorithms
Automatica (Journal of IFAC)
Neuroevolution strategies for episodic reinforcement learning
Journal of Algorithms
Cooperative multi-robot reinforcement learning: a framework in hybrid state space
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
2010 Special Issue: Parameter-exploring policy gradients
Neural Networks
A Convergent Online Single Time Scale Actor Critic Algorithm
The Journal of Machine Learning Research
Impedance learning for robotic contact tasks using natural actor-critic algorithm
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Exponential natural evolution strategies
Proceedings of the 12th annual conference on Genetic and evolutionary computation
The Dynamics of Multi-Agent Reinforcement Learning
Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Taming the beast: guided self-organization of behavior in autonomous robots
SAB'10 Proceedings of the 11th international conference on Simulation of adaptive behavior: from animals to animats
Bidirectional relation between CMA evolution strategies and natural evolution strategies
PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Modeling spoken decision making dialogue and optimization of its dialogue strategy
SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
A Generalized Path Integral Control Approach to Reinforcement Learning
The Journal of Machine Learning Research
ACM Transactions on Speech and Language Processing (TSLP)
Modeling spoken decision support dialogue and optimization of its dialogue strategy
ACM Transactions on Speech and Language Processing (TSLP)
TAROS'11 Proceedings of the 12th Annual conference on Towards autonomous robotic systems
Robot learning from demonstration by constructing skill trees
International Journal of Robotics Research
Actor-Critic algorithm based on incremental least-squares temporal difference with eligibility trace
ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligence
Learning to make predictions in partially observable environments without a generative model
Journal of Artificial Intelligence Research
A competitive strategy for function approximation in Q-learning
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Analysis of a natural gradient algorithm on monotonic convex-quadratic-composite functions
Proceedings of the 14th annual conference on Genetic and evolutionary computation
Unified inter and intra options learning using policy gradient methods
EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Reinforcement learning of question-answering dialogue policies for virtual museum guides
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Active learning of inverse models with intrinsically motivated goal exploration in robots
Robotics and Autonomous Systems
Apprenticeship learning with few examples
Neurocomputing
Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning
Robotics and Autonomous Systems
Guided latent space regression for human motion generation
Robotics and Autonomous Systems
Efficient sample reuse in policy gradients with parameter-based exploration
Neural Computation
Machine learning for interactive systems and robots: a brief introduction
Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Learning to select and generalize striking movements in robot table tennis
International Journal of Robotics Research
The Journal of Machine Learning Research
Scenario Trees and Policy Selection for Multistage Stochastic Programming Using Machine Learning
INFORMS Journal on Computing
Reinforcement learning in robotics: A survey
International Journal of Robotics Research
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Reinforcement learning algorithms with function approximation: Recent advances and applications
Information Sciences: an International Journal
Gaussian Processes for POMDP-Based Dialogue Manager Optimization
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Policy oscillation is overshooting
Neural Networks
Multi-timescale nexting in a reinforcement learning robot
Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Hi-index | 0.01 |
In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-Critic. The actor updates are achieved using stochastic policy gradients employing Amari's natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.