Emulation and behavior understanding through shared values

Authors:
Yasutake Takahashi;Yoshihiro Tamura;Minoru Asada;Mario Negrello
Affiliations:
Graduate School of Engineering, Osaka University, Yamadaoka 2-1, Suita, Osaka, 565-0871, Japan;Graduate School of Engineering, Osaka University, Yamadaoka 2-1, Suita, Osaka, 565-0871, Japan;Graduate School of Engineering, Osaka University, Yamadaoka 2-1, Suita, Osaka, 565-0871, Japan and JST ERATO Asada Synergistic Intelligence Project, Yamadaoka 2-1, Suita, Osaka, 565-0871, Japan;Fraunhofer IAIS, Schloss Birlinghoven, 53754 Sankt Augustin, Germany
Venue:
Robotics and Autonomous Systems
Year:
2010

Citing 8
Cited 0

Transfer of Learning by Composing Solutions of Elemental Sequential Tasks

Machine Learning
RoboCop: today and tomorrow-what we have learned

Artificial Intelligence - Special issue on Robocop: the first step
Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Robot Learning

Robot Learning
The agent-based perspective on imitation

Imitation in animals and artifacts
2006 Special issue: Mirror neurons and imitation: A computationally guided review

Neural Networks - 2006 Special issue: The brain mechanisms of imitation learning
Adaptive mixtures of local experts

Neural Computation
Accelerating reinforcement learning through implicit imitation

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Neurophysiology has revealed the existence of mirror neurons in the brain of macaque monkeys that activate when the monkey executes a goal directed behavior and also when it observes the same behavior performed by another. The concept of the mirror neurons/systems (Oztop and Kawato, 2006) [19] suggests that behavior acquisition and understanding of others' behavior are related. We propose a method not only to learn and execute a variety of behaviors but also to understand observed behavior, supposing that the observer has already acquired the utilities (state values in reinforcement learning scheme) of all kinds of behavior the observed agent can do. The method does not need a precise world model or a coordinate transformation system to deal with a view difference caused by different viewpoints. This paper shows that an observer can understand/recognize behaviors shown by a demonstrator based not on a precise object trajectory in allocentric/egocentric coordinate space but rather on an estimated utility transition during the observed behavior. Furthermore, it is shown that the loop of the behavior acquisition and recognition of observed behavior accelerates the learning and improves the recognition performance. The state value updates can be accelerated by the observation without real trial and error while the learned values enrich the recognition system, since it is based on estimation of the state value of the observed behavior. The validity of the proposed method is shown by applying it to a dynamic environment where two robots play soccer.