EBLA: a perceptually grounded model of language acquisition

Authors:
Brian E. Pangburn;Robert C. Mathews;S. Sitharama Iyengar;Jonathan P. Ayo
Affiliations:
The Pangburn Company, Inc., New Roads, LA;Louisiana State University, Baton Rouge, LA;Louisiana State University, Baton Rouge, LA;Louisiana State University, Baton Rouge, LA
Venue:
HLT-NAACL-LWM '04 Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data - Volume 6
Year:
2003

Citing 6
Cited 0

Naive physics, event perception, lexical semantics, and language acquisition

Naive physics, event perception, lexical semantics, and language acquisition
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Maximum-Likelihood Approach to Visual Event Classification

ECCV '96 Proceedings of the 4th European Conference on Computer Vision-Volume II - Volume II
Visual Event Classification via Force Dynamics

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
When push comes to shove: a computational model of the role of motor control in the acquisition of action verbs

When push comes to shove: a computational model of the role of motor control in the acquisition of action verbs
Experience-based language acquisition: a computational model of human language acquisition

Experience-based language acquisition: a computational model of human language acquisition

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces an open computational framework for visual perception and grounded language acquisition called Experience-Based Language Acquisition (EBLA). EBLA can "watch" a series of short videos and acquire a simple language of nouns and verbs corresponding to the objects and object-object relations in those videos. Upon acquiring this protolanguage, EBLA can perform basic scene analysis to generate descriptions of novel videos.The performance of EBLA has been evaluated based on accuracy and speed of protolanguage acquisition as well as on accuracy of generated scene descriptions. For a test set of simple animations, EBLA had average acquisition success rates as high as 100% and average description success rates as high as 96.7%. For a larger set of real videos, EBLA had average acquisition success rates as high as 95.8% and average description success rates as high as 65.3%. The lower description success rate for the videos is attributed to the wide variance in the appearance of objects across the test set.While there have been several systems capable of learning object or event labels for videos, EBLA is the first known system to acquire both nouns and verbs using a grounded computer vision system.