2008 Special Issue: Reinforcement learning of motor skills with policy gradients

Authors:
Jan Peters;Stefan Schaal
Affiliations:
Max Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076 Tübingen, Germany and University of Southern California, 3710 S. McClintoch Ave-RTH401, Los Angeles, CA 90089-2905, USA;University of Southern California, 3710 S. McClintoch Ave-RTH401, Los Angeles, CA 90089-2905, USA and ATR Computational Neuroscience Laboratory, 2-2-2 Hikaridai, Seika-cho, Soraku-gun Kyoto 619-02 ...
Venue:
Neural Networks
Year:
2008

Citing 22
Cited 27

Practical methods of optimization; (2nd ed.)

Practical methods of optimization; (2nd ed.)
Likelihood ratio gradient estimation for stochastic systems

Communications of the ACM - Special issue on simulation
Statistical inference, Occam's razor, and statistical mechanics on the space of probability distributions

Neural Computation
Natural gradient works efficiently in learning

Neural Computation
Likelilood ratio gradient estimation: an overview

WSC '87 Proceedings of the 19th conference on Winter simulation
Simulation-Based Optimization with Stochastic Approximation Using Common Random Numbers

Management Science
Statistical machine learning and combinatorial optimization

Theoretical aspects of evolutionary computing
Learning Control Under Extreme Uncertainty

Advances in Neural Information Processing Systems 5, [NIPS Conference]
Reinforcement Learning for Biped Locomotion

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
PEGASUS: A policy search method for large MDPs and POMDPs

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
The Optimal Reward Baseline for Gradient-Based Reinforcement Learning

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Feature Article: Optimization for simulation: Theory vs. Practice

INFORMS Journal on Computing
Variance Reduction Techniques for Gradient Estimates in Reinforcement Learning

The Journal of Machine Learning Research
Machine learning of motor skills for robotics

Machine learning of motor skills for robotics
Minimum acceleration criterion with constraints implies bang-bang control as an underlying principle for optimal trajectories of arm reaching movements

Neural Computation
Reinforcement learning for a CPG-driven biped robot

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning CPG sensory feedback with policy gradient for biped locomotion for a full-body humanoid

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Experiments with infinite-horizon, policy-gradient estimation

Journal of Artificial Intelligence Research
Covariant policy search

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Efficient gradient estimation for motor control learning

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm

CIS'05 Proceedings of the 2005 international conference on Computational Intelligence and Security - Volume Part I
Natural actor-critic

ECML'05 Proceedings of the 16th European conference on Machine Learning

Gaussian process dynamic programming

Neurocomputing
Reinforcement learning for robot soccer

Autonomous Robots
Adaptive importance sampling for value function approximation in off-policy reinforcement learning

Neural Networks
2010 Special Issue: Parameter-exploring policy gradients

Neural Networks
Task-specific generalization of discrete and periodic dynamic movement primitives

IEEE Transactions on Robotics
Learning visual representations for perception-action systems

International Journal of Robotics Research
A Generalized Path Integral Control Approach to Reinforcement Learning

The Journal of Machine Learning Research
Natural actor and belief critic: Reinforcement algorithm for learning parameters of dialogue systems modelled as POMDPs

ACM Transactions on Speech and Language Processing (TSLP)
Learning cost-efficient control policies with XCSF: generalization capabilities and further improvement

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Improving Gaussian process value function approximation in policy gradient algorithms

ICANN'11 Proceedings of the 21st international conference on Artificial neural networks - Volume Part II
Preference-based policy learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Landau theory of meta-learning

SIIS'11 Proceedings of the 2011 international conference on Security and Intelligent Information Systems
Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives

Robotics and Autonomous Systems
A Novel Trajectory Generation Method for Robot Control

Journal of Intelligent and Robotic Systems
APRIL: active preference learning-based reinforcement learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Active learning of inverse models with intrinsically motivated goal exploration in robots

Robotics and Autonomous Systems
Control of a free-falling cat by policy-based reinforcement learning

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Dynamical movement primitives: Learning attractor models for motor behaviors

Neural Computation
From dynamic movement primitives to associative skill memories

Robotics and Autonomous Systems
Learning to select and generalize striking movements in robot table tennis

International Journal of Robotics Research
Reinforcement learning in robotics: A survey

International Journal of Robotics Research
Interaction learning for dynamic movement primitives used in cooperative robotic tasks

Robotics and Autonomous Systems
Movement primitives as a robotic tool to interpret trajectories through learning-by-doing

International Journal of Automation and Computing
Fast damage recovery in robotics with the T-resilience algorithm

International Journal of Robotics Research
Filtering sensory information with xcsf: Improving learning robustness and robot arm control performance

Evolutionary Computation
Socially guided intrinsic motivation for robot learning of motor skills

Autonomous Robots
A tour of machine learning: An AI perspective

AI Communications - ECAI 2012 Turing and Anniversary Track

Quantified Score

Hi-index	0.00

Visualization

Abstract

Autonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of complex motor skills with human-like limbs. While supervised learning can offer useful tools for bootstrapping behavior, e.g., by learning from demonstration, it is only reinforcement learning that offers a general approach to the final trial-and-error improvement that is needed by each individual acquiring a skill. Neither neurobiological nor machine learning studies have, so far, offered compelling results on how reinforcement learning can be scaled to the high-dimensional continuous state and action spaces of humans or humanoids. Here, we combine two recent research developments on learning motor control in order to achieve this scaling. First, we interpret the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning. Second, we combine motor primitives with the theory of stochastic policy gradient learning, which currently seems to be the only feasible framework for reinforcement learning for humanoids. We evaluate different policy gradient methods with a focus on their applicability to parameterized motor primitives. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.