Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot

Authors:
Gen Endo;Jun Morimoto;Takamitsu Matsubara;Jun Nakanishi;Gordon Cheng
Affiliations:
Tokyo Institute of Technology 2-12-1 Ookayama, Meguro-kuTokyo, 152-8550, Japan;ATR Computational Neuroscience Laboratories ComputationalBrain Project, ICORP Japan Science and Technology Agency 2-2-2 Hikaridai,Seika-cho, Soraku-gun Kyoto, 619-0288, Japan;ATR Computational Neuroscience Laboratories 2-2-2 Hikaridai,Seika-cho, Soraku-gun Kyoto, 619-0288, Japan;ATR Computational Neuroscience Laboratories ComputationalBrain Project, ICORP Japan Science and Technology Agency 2-2-2 Hikaridai,Seika-cho, Soraku-gun Kyoto, 619-0288, Japan;ATR Computational Neuroscience Laboratories ICORP, JapanScience and Technology Agency 2-2-2 Hikaridai, Seika-cho, Soraku-gun Kyoto,619-0288, Japan
Venue:
International Journal of Robotics Research
Year:
2008

Citing 10
Cited 10

Passive dynamic walking

International Journal of Robotics Research
Neural control of rhythmic arm movements

Neural Networks - Special issue on neural control and robotics: biology and technology
Sensorimotor Interactions During Locomotion: Principles Derived from Biological Systems

Autonomous Robots
An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Locomotion Control of a Biped Robot Using Nonlinear Oscillators

Autonomous Robots
Reinforcement Learning in Continuous Time and Space

Neural Computation
Reinforcement learning for a CPG-driven biped robot

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Learning CPG sensory feedback with policy gradient for biped locomotion for a full-body humanoid

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Infinite-horizon policy-gradient estimation

Journal of Artificial Intelligence Research

Recent progress and development of the humanoid robot HanSaRam

Robotics and Autonomous Systems
Direct programming of a central pattern generator for periodic motions by touching

Robotics and Autonomous Systems
A biped static balance control and torque pattern learning under unknown periodic external forces

Engineering Applications of Artificial Intelligence
A study of adaptive locomotive behaviors of a biped robot: patterns generation and classification

SAB'10 Proceedings of the 11th international conference on Simulation of adaptive behavior: from animals to animats
On optimizing interdependent skills: a case study in simulated 3D humanoid robot soccer

The 10th International Conference on Autonomous Agents and Multiagent Systems - Volume 2
A bio-inspired approach for online trajectory generation of industrial robots

Adaptive Behavior - Animals, Animats, Software Agents, Robots, Adaptive Systems
Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives

Robotics and Autonomous Systems
Design with shape grammars and reinforcement learning

Advanced Engineering Informatics
Adaptive splitbelt treadmill walking of a biped robot using nonlinear oscillators with phase resetting

Autonomous Robots
Reinforcement learning in robotics: A survey

International Journal of Robotics Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe a learning framework for a central pattern generator (CPG)-based biped locomotion controller using a policy gradient method. Our goals in this study are to achieve CPG-based biped walking with a 3D hardware humanoid and to develop an efficient learning algorithm with CPG by reducing the dimensionality of the state space used for learning. We demonstrate that an appropriate feedback controller can be acquired within a few thousand trials by numerical simulations and the controller obtained in numerical simulation achieves stable walking with a physical robot in the real world. Numerical simulations and hardware experiments evaluate the walking velocity and stability. The results suggest that the learning algorithm is capable of adapting to environmental changes. Furthermore, we present an online learning scheme with an initial policy for a hardware robot to improve the controller within 200 iterations.