View estimation learning based on value system

Authors:
Yasutake Takahashi;Kouki Shimada;Minoru Asada
Affiliations:
Dept. of Adaptive Machine Systems, Graduate School of Engineering, Osaka University, Osaka, Japan;Dept. of Adaptive Machine Systems, Graduate School of Engineering, Osaka University, Osaka, Japan;Dept. of Adaptive Machine Systems, Graduate School of Engineering, Osaka University, Osaka, Japan and JST ERATO
Venue:
FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Year:
2009

Citing 2
Cited 0

Introduction to Reinforcement Learning

Introduction to Reinforcement Learning
Robot Learning

Robot Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Estimation of a caregiver's view is one of the most important capabilities for a child to understand the behavior demonstrated by the caregiver, that is, to infer the intention of behavior and/or to learn the observed behavior efficiently. We hypothesize that the child develops this ability in the same way as behavior learning motivated by an intrinsic reward, that is, he/she updates the model of the estimated view of his/her own during the behavior imitated from the observation of the behavior demonstrated by the caregiver based on minimizing the estimation error of the reward during the behavior. From this view, this paper shows a method for acquiring such a capability based on a value system from which values can be obtained by reinforcement learning. The parameters of the view estimation are updated based on the temporal difference error (hereafter TD error: estimation error of the state value), analogous to the way such that the parameters of the state value of the behavior are updated based on the TD error. Experiments with simple humanoid robots show the validity of the method, and the developmental process parallel to young children's estimation of its own view during the imitation of the observed behavior demonstrated by the caregiver is discussed.