Learning CPG sensory feedback with policy gradient for biped locomotion for a full-body humanoid

Authors:
Gen Endo;Jun Morimoto;Takamitsu Matsubara;Jun Nakanishi;Gordon Cheng
Affiliations:
Sony Intelligence Dynamics Laboratories, Inc., Shinagawa-ku, Tokyo, Japan and ATR Computational Neuroscience Laboratories, Soraku-gun, Kyoto, Japan;ATR Computational Neuroscience Laboratories, Shinagawa-ku, Kyoto, Japan and Computational Brain Project, ICORP, Japan Science and Technology Agency, Soraku-gun, Kyoto, Japan;ATR Computational Neuroscience Laboratories, Soraku-gun, Kyoto, Japan and Nara Institute of Science and Technology, Ikoma-shi, Nara, Japan;ATR Computational Neuroscience Laboratories, Soraku-gun, Kyoto, Japan and Computational Brain Project, ICORP, Japan Science and Technology Agency, Soraku-gun, Kyoto, Japan;ATR Computational Neuroscience Laboratories, Soraku-gun, Kyoto, Japan and Computational Brain Project, ICORP, Japan Science and Technology Agency, Soraku-gun, Kyoto, Japan
Venue:
AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 3
Year:
2005

Citing 6
Cited 3

Planning and acting in partially observable stochastic domains

Artificial Intelligence
Neural control of rhythmic arm movements

Neural Networks - Special issue on neural control and robotics: biology and technology
An Analysis of Actor/Critic Algorithms Using Eligibility Traces: Reinforcement Learning with Imperfect Value Function

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
On Actor-Critic Algorithms

SIAM Journal on Control and Optimization
Reinforcement Learning in Continuous Time and Space

Neural Computation
Reinforcement learning for a CPG-driven biped robot

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence

Learning CPG-based Biped Locomotion with a Policy Gradient Method: Application to a Humanoid Robot

International Journal of Robotics Research
2008 Special Issue: Reinforcement learning of motor skills with policy gradients

Neural Networks
Policy Learning --- A Unified Perspective with Applications in Robotics

Recent Advances in Reinforcement Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a learning framework for a central pattern generator based biped locomotion controller using a policy gradient method. Our goals in this study are to achieve biped walking with a 3D hardware humanoid, and to develop an efficient learning algorithm with CPG by reducing the dimensionality of the state space used for learning. We demonstrate that an appropriate feed-back controller can be acquired within a thousand trials by numerical simulations and the obtained controller in numerical simulation achieves stable walking with a physical robot in the real world. Numerical simulations and hardware experiments evaluated walking velocity and stability. Furthermore, we present the possibility of an additional online learning using a hardware robot to improve the controller within 200 iterations.