Reinforcement learning for a biped robot based on a CPG-actor-critic method

Authors:
Yutaka Nakamura;Takeshi Mori;Masa-aki Sato;Shin Ishii
Affiliations:
Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan and Osaka University, 2-1 Yamadaoka, Suita, Osaka 565-0871, Japan;Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan;ATR Computational Neuroscience Laboratories, 2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0288, Japan;Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
Venue:
Neural Networks
Year:
2007

Citing 15
Cited 11

Passive dynamic walking

International Journal of Robotics Research
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

Machine Learning
Linear least-squares algorithms for temporal difference learning

Machine Learning - Special issue on reinforcement learning
Natural gradient works efficiently in learning

Neural Computation
Neural control of rhythmic arm movements

Neural Networks - Special issue on neural control and robotics: biology and technology
Walknet—a biologically inspired network to control six-legged walking

Neural Networks - Special issue on neural control and robotics: biology and technology
Reinforcement learning based on on-line EM algorithm

Proceedings of the 1998 conference on Advances in neural information processing systems II
Control of exploitation-exploration meta-parameter in reinforcement learning

Neural Networks - Computational models of neuromodulation
An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Least-Squares Methods in Reinforcement Learning for Control

SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Automatic basis function construction for approximate dynamic programming and reinforcement learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Reinforcement learning: a survey

Journal of Artificial Intelligence Research
Natural actor-critic

ECML'05 Proceedings of the 16th European conference on Machine Learning
Gradient calculations for dynamic recurrent neural networks: a survey

IEEE Transactions on Neural Networks

Learning to Move in Modular Robots using Central Pattern Generators and Online Optimization

International Journal of Robotics Research
2008 Special Issue: Central pattern generators for locomotion control in animals and robots: A review

Neural Networks
Flexible Control Mechanism for Multi-DOF Robotic Arm Based on Biological Fluctuation

SAB '08 Proceedings of the 10th international conference on Simulation of Adaptive Behavior: From Animals to Animats
The Neuromodulatory System: A Framework for Survival and Adaptive Behavior in a Challenging World

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Direct programming of a central pattern generator for periodic motions by touching

Robotics and Autonomous Systems
A self-organizing map for controlling artificial locomotion

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part II
Neural oscillators programming simplified

Applied Computational Intelligence and Soft Computing
Chaotic exploration and learning of locomotion behaviors

Neural Computation
Fuzzy SVM learning control system considering time properties of biped walking samples

Engineering Applications of Artificial Intelligence
DCOB: Action space for reinforcement learning of high DoF robots

Autonomous Robots
Fast damage recovery in robotics with the T-resilience algorithm

International Journal of Robotics Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Animals' rhythmic movements, such as locomotion, are considered to be controlled by neural circuits called central pattern generators (CPGs), which generate oscillatory signals. Motivated by this biological mechanism, studies have been conducted on the rhythmic movements controlled by CPG. As an autonomous learning framework for a CPG controller, we propose in this article a reinforcement learning method we call the ''CPG-actor-critic'' method. This method introduces a new architecture to the actor, and its training is roughly based on a stochastic policy gradient algorithm presented recently. We apply this method to an automatic acquisition problem of control for a biped robot. Computer simulations show that training of the CPG can be successfully performed by our method, thus allowing the biped robot to not only walk stably but also adapt to environmental changes.