Natural Actor-Critic

Authors:
Jan Peters;Stefan Schaal
Affiliations:
Max-Planck-Institute for Biological Cybernetics, Tuebingen, Germany and University of Southern California, Los Angeles, CA 90089, USA;University of Southern California, Los Angeles, CA 90089, USA and ATR Computational Neuroscience Laboratories, Kyoto 619-0288, Japan
Venue:
Neurocomputing
Year:
2008

Citing 11
Cited 51

Natural gradient works efficiently in learning

Neural Computation
Gradient descent for general reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Reinforcement Learning

Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
An introduction to reinforcement learning theory: value function methods

Advanced lectures on machine learning
Conditional random fields for multi-agent reinforcement learning

Proceedings of the 24th international conference on Machine learning
Shaping multi-agent systems with gradient reinforcement learning

Autonomous Agents and Multi-Agent Systems
Covariant policy search

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Natural actor-critic

ECML'05 Proceedings of the 16th European conference on Machine Learning

Evolution Strategies for Direct Policy Search

Proceedings of the 10th international conference on Parallel Problem Solving from Nature: PPSN X
Basis Expansion in Natural Actor Critic Methods

Recent Advances in Reinforcement Learning
Variable Metric Reinforcement Learning Methods Applied to the Noisy Mountain Car Problem

Recent Advances in Reinforcement Learning
Gaussian process dynamic programming

Neurocomputing
A survey of robot learning from demonstration

Robotics and Autonomous Systems
Hoeffding and Bernstein races for selecting policies in evolutionary direct policy search

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Stochastic search using the natural gradient

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
An empirical analysis of value function-based and policy search reinforcement learning

Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
Efficient natural evolution strategies

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Uncertainty handling CMA-ES for reinforcement learning

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Natural actor-critic algorithms

Automatica (Journal of IFAC)
Neuroevolution strategies for episodic reinforcement learning

Journal of Algorithms
Real-time reinforcement learning by sequential Actor-Critics and experience replay

Neural Networks
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning

Neural Computation
Cooperative multi-robot reinforcement learning: a framework in hybrid state space

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
2010 Special Issue: Parameter-exploring policy gradients

Neural Networks
A Convergent Online Single Time Scale Actor Critic Algorithm

The Journal of Machine Learning Research
Impedance learning for robotic contact tasks using natural actor-critic algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Exponential natural evolution strategies

Proceedings of the 12th annual conference on Genetic and evolutionary computation
The Dynamics of Multi-Agent Reinforcement Learning

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Taming the beast: guided self-organization of behavior in autonomous robots

SAB'10 Proceedings of the 11th international conference on Simulation of adaptive behavior: from animals to animats
Bidirectional relation between CMA evolution strategies and natural evolution strategies

PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part I
Modeling spoken decision making dialogue and optimization of its dialogue strategy

SIGDIAL '10 Proceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue
A Generalized Path Integral Control Approach to Reinforcement Learning

The Journal of Machine Learning Research
Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs

ACM Transactions on Speech and Language Processing (TSLP)
Modeling spoken decision support dialogue and optimization of its dialogue strategy

ACM Transactions on Speech and Language Processing (TSLP)
Instance-based reinforcement learning technique with a meta-learning mechanism for robust multi-robot systems

TAROS'11 Proceedings of the 12th Annual conference on Towards autonomous robotic systems
Robot learning from demonstration by constructing skill trees

International Journal of Robotics Research
Actor-Critic algorithm based on incremental least-squares temporal difference with eligibility trace

ICIC'11 Proceedings of the 7th international conference on Advanced Intelligent Computing Theories and Applications: with aspects of artificial intelligence
Learning to make predictions in partially observable environments without a generative model

Journal of Artificial Intelligence Research
2012 Special Issue: Hierarchical curiosity loops and active sensing

Neural Networks
A competitive strategy for function approximation in Q-learning

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Analysis of a natural gradient algorithm on monotonic convex-quadratic-composite functions

Proceedings of the 14th annual conference on Genetic and evolutionary computation
Unified inter and intra options learning using policy gradient methods

EWRL'11 Proceedings of the 9th European conference on Recent Advances in Reinforcement Learning
Reinforcement learning of question-answering dialogue policies for virtual museum guides

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Active learning of inverse models with intrinsically motivated goal exploration in robots

Robotics and Autonomous Systems
Apprenticeship learning with few examples

Neurocomputing
Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning

Robotics and Autonomous Systems
Guided latent space regression for human motion generation

Robotics and Autonomous Systems
Efficient sample reuse in policy gradients with parameter-based exploration

Neural Computation
Machine learning for interactive systems and robots: a brief introduction

Proceedings of the 2nd Workshop on Machine Learning for Interactive Systems: Bridging the Gap Between Perception, Action and Communication
Learning to select and generalize striking movements in robot table tennis

International Journal of Robotics Research
Dynamic policy programming

The Journal of Machine Learning Research
Scenario Trees and Policy Selection for Multistage Stochastic Programming Using Machine Learning

INFORMS Journal on Computing
Reinforcement learning in robotics: A survey

International Journal of Robotics Research
On stochastic optimal control and reinforcement learning by approximate inference (extended abstract)

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Reinforcement learning algorithms with function approximation: Recent advances and applications

Information Sciences: an International Journal
An autonomous manipulation system based on force control and optimization

Autonomous Robots
Gaussian Processes for POMDP-Based Dialogue Manager Optimization

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Policy oscillation is overshooting

Neural Networks
Multi-timescale nexting in a reinforcement learning robot

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-Critic. The actor updates are achieved using stochastic policy gradients employing Amari's natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.