Policy gradient learning for a humanoid soccer robot

Authors:
A. Cherubini;F. Giannone;L. Iocchi;M. Lombardo;G. Oriolo
Affiliations:
Dipartimento di Informatica e Sistemistica, Universití di Roma "La Sapienza", Via Ariosto 25, 00185 Roma, Italy;Dipartimento di Informatica e Sistemistica, Universití di Roma "La Sapienza", Via Ariosto 25, 00185 Roma, Italy;Dipartimento di Informatica e Sistemistica, Universití di Roma "La Sapienza", Via Ariosto 25, 00185 Roma, Italy;Dipartimento di Informatica e Sistemistica, Universití di Roma "La Sapienza", Via Ariosto 25, 00185 Roma, Italy;Dipartimento di Informatica e Sistemistica, Universití di Roma "La Sapienza", Via Ariosto 25, 00185 Roma, Italy
Venue:
Robotics and Autonomous Systems
Year:
2009

Citing 3
Cited 4

Reinforcement Learning for Biped Locomotion

ICANN '02 Proceedings of the International Conference on Artificial Neural Networks
Machine learning for fast quadrupedal locomotion

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Machine Learning With AIBO Robots in the Four-Legged League of RoboCup

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews

A novel reinforcement learning architecture for continuous state and action spaces

Advances in Artificial Intelligence
Automated generation of CPG-based locomotion for robot Nao

Robot Soccer World Cup XV
Biologically inspired layered learning in humanoid robots

Knowledge-Based Systems
Petri-net-based implementations for FIRA weightlifting and sprint games with a humanoid robot

Robotics and Autonomous Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In humanoid robotic soccer, many factors, both at low-level (e.g., vision and motion control) and at high-level (e.g., behaviors and game strategies), determine the quality of the robot performance. In particular, the speed of individual robots, the precision of the trajectory, and the stability of the walking gaits, have a high impact on the success of a team. Consequently, humanoid soccer robots require fine tuning, especially for the basic behaviors. In recent years, machine learning techniques have been used to find optimal parameter sets for various humanoid robot behaviors. However, a drawback of learning techniques is time consumption: a practical learning method for robotic applications must be effective with a small amount of data. In this article, we compare two learning methods for humanoid walking gaits based on the Policy Gradient algorithm. We demonstrate that an extension of the classic Policy Gradient algorithm that takes into account parameter relevance allows for better solutions when only a few experiments are available. The results of our experimental work show the effectiveness of the policy gradient learning method, as well as its higher convergence rate, when the relevance of parameters is taken into account during learning.