A Computational Model of Embodied Language Learning

  • Authors:
  • C. Yu;D. H. Ballard

  • Affiliations:
  • -;-

  • Venue:
  • A Computational Model of Embodied Language Learning
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Language is about symbols and those symbols must be grounded in the physical environment during human development. Most recently, there has been an increased awareness of the essential role of inferences of speakersU referential intentions in grounding those symbols. Experiments have shown that these inferences as revealed in eye, head and hand movements serve as an important driving force in language learning at a relatively early age. The challenge ahead is to develop formal models of language acquisition that can shed light on the leverage provided by embodiment. We present an implemented computational model of embodied language acquisition that learns words from natural interactions with users. The system can be trained in unsupervised mode in which users perform everyday tasks while providing natural language descriptions of their behaviors. We collect acoustic signals in concert with user-centric multisensory information from nonspeech modalities, such as userUs perspective video, gaze positions, head directions and hand movements. A multimodal learning algorithm is developed that firstly spots words from continuous speech and then associates action verbs and object names with their grounded meanings. The central idea is to make use of non-speech contextual information to facilitate word spotting, and utilize userUs attention as deictic reference to discover temporal correlations of data from different modalities to build lexical items. We report the results of a series of experiments that demonstrate the effectiveness of our approach.