Natural actor-critic

Authors:
Jan Peters;Sethu Vijayakumar;Stefan Schaal
Affiliations:
University of Southern California, Los Angeles, CA;University of Edinburgh, Edinburgh, United Kingdom;University of Southern California, Los Angeles, CA
Venue:
ECML'05 Proceedings of the 16th European conference on Machine Learning
Year:
2005

Citing 6
Cited 30

Natural gradient works efficiently in learning

Neural Computation
Gradient descent for general reinforcement learning

Proceedings of the 1998 conference on Advances in neural information processing systems II
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Neuro-Dynamic Programming

Neuro-Dynamic Programming
Least-Squares Temporal Difference Learning

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Covariant policy search

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Conditional random fields for multi-agent reinforcement learning

Proceedings of the 24th international conference on Machine learning
Reinforcement learning for a biped robot based on a CPG-actor-critic method

Neural Networks
Shaping multi-agent systems with gradient reinforcement learning

Autonomous Agents and Multi-Agent Systems
Natural Actor-Critic

Neurocomputing
2008 Special Issue: Reinforcement learning of motor skills with policy gradients

Neural Networks
A semiparametric statistical approach to model-free policy evaluation

Proceedings of the 25th international conference on Machine learning
Efficiently learning linear-linear exponential family predictive representations of state

Proceedings of the 25th international conference on Machine learning
Policy Learning for Motor Skills

Neural Information Processing
Policy Gradients with Parameter-Based Exploration for Control

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
A New Natural Policy Gradient by Stationary Distribution Metric

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
State-Dependent Exploration for Policy Gradient Methods

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Policy Learning --- A Unified Perspective with Applications in Robotics

Recent Advances in Reinforcement Learning
A Collaborative Reinforcement Learning Approach to Urban Traffic Control Optimization

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
The factored policy-gradient planner

Artificial Intelligence
Exploiting locality of interactions using a policy-gradient approach in multiagent learning

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Efficient Sample Reuse in EM-Based Policy Search

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Derivatives of logarithmic stationary distributions for policy gradient reinforcement learning

Neural Computation
2010 Special Issue: Parameter-exploring policy gradients

Neural Networks
Solving deep memory POMDPs with recurrent policy gradients

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Higher Coordination With Less Control-A Result of Information Maximization in the Sensorimotor Loop

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Taming the beast: guided self-organization of behavior in autonomous robots

SAB'10 Proceedings of the 11th international conference on Simulation of adaptive behavior: from animals to animats
Kalman temporal differences

Journal of Artificial Intelligence Research
Hessian matrix distribution for Bayesian policy gradient reinforcement learning

Information Sciences: an International Journal
Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs

ACM Transactions on Speech and Language Processing (TSLP)
Smart data structures: an online machine learning approach to multicore data structures

Proceedings of the 8th ACM international conference on Autonomic computing
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning

Neural Computation
Coordination of urban intersection agents based on multi-interaction history learning method

ICSI'10 Proceedings of the First international conference on Advances in Swarm Intelligence - Volume Part II
A comparative study of reinforcement learning techniques on dialogue management

EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
Two-step gradient-based reinforcement learning for underwater robotics behavior learning

Robotics and Autonomous Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates a novel model-free reinforcement learning architecture, the Natural Actor-Critic. The actor updates are based on stochastic policy gradients employing Amari's natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.