A Computational Model of Embodied Language Learning

Authors:
C. Yu;D. H. Ballard
Affiliations:
-;-
Venue:
A Computational Model of Embodied Language Learning
Year:
2003

Citing 0
Cited 2

Multimodal new vocabulary recognition through speech and handwriting in a whiteboard scheduling application

Proceedings of the 10th international conference on Intelligent user interfaces
Using redundant speech and handwriting for learning new vocabulary and understanding abbreviations

Proceedings of the 8th international conference on Multimodal interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

Language is about symbols and those symbols must be grounded in the physical environment during human development. Most recently, there has been an increased awareness of the essential role of inferences of speakersU referential intentions in grounding those symbols. Experiments have shown that these inferences as revealed in eye, head and hand movements serve as an important driving force in language learning at a relatively early age. The challenge ahead is to develop formal models of language acquisition that can shed light on the leverage provided by embodiment. We present an implemented computational model of embodied language acquisition that learns words from natural interactions with users. The system can be trained in unsupervised mode in which users perform everyday tasks while providing natural language descriptions of their behaviors. We collect acoustic signals in concert with user-centric multisensory information from nonspeech modalities, such as userUs perspective video, gaze positions, head directions and hand movements. A multimodal learning algorithm is developed that firstly spots words from continuous speech and then associates action verbs and object names with their grounded meanings. The central idea is to make use of non-speech contextual information to facilitate word spotting, and utilize userUs attention as deictic reference to discover temporal correlations of data from different modalities to build lexical items. We report the results of a series of experiments that demonstrate the effectiveness of our approach.