Learning to talk about events from narrated video in a construction grammar framework

Authors:
Peter Ford Dominey;Jean-David Boucher
Affiliations:
Institut des Sciences Cognitives, CNRS 67 Blvd. Pinel, 69675 Bron Cedex, France;Institut des Sciences Cognitives, CNRS 67 Blvd. Pinel, 69675 Bron Cedex, France
Venue:
Artificial Intelligence - Special volume on connecting language to the world
Year:
2005

Citing 8
Cited 6

L0—the first five years of an automated language acquisition project

Artificial Intelligence Review - Special issue: grounding representations
Language Games for Autonomous Robots

IEEE Intelligent Systems
Inducing Probabilistic Grammars by Bayesian Model Merging

ICGI '94 Proceedings of the Second International Colloquium on Grammatical Inference and Applications
Grounding Meaning in Perception

GWAI '90 Proceedings of the 14th German Workshop on Artificial Intelligence
Electrophysiological Signatures of Visual Lexical Processing: Open- and Closed-Class Words

Journal of Cognitive Neuroscience
Grounded semantic composition for visual scenes

Journal of Artificial Intelligence Research
Specific-to-general learning for temporal events with application to learning event definitions from video

Journal of Artificial Intelligence Research
Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic

Journal of Artificial Intelligence Research

Cognitive Robotics: Command, Interrogation and Teaching in Robot Coaching

RoboCup 2006: Robot Soccer World Cup X
Language Label Learning for Visual Concepts Discovered from Video Sequences

Attention in Cognitive Systems. Theories and Systems from an Interdisciplinary Viewpoint
Robots that say 'no'

ECAL'09 Proceedings of the 10th European conference on Advances in artificial life: Darwin meets von Neumann - Volume Part II
PASAR: An integrated model of prediction, anticipation, sensation, attention and response for artificial sensorimotor systems

Information Sciences: an International Journal
Perceptual-Motor sequence learning via human-robot interaction

SAB'06 Proceedings of the 9th international conference on From Animals to Animats: simulation of Adaptive Behavior
Unsupervised language learning for discovered visual concepts

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part IV

Quantified Score

Hi-index	0.01

Visualization

Abstract

The current research presents a system that learns to understand object names, spatial relation terms and event descriptions from observing narrated action sequences. The system extracts meaning from observed visual scenes by exploiting perceptual primitives related to motion and contact in order to represent events and spatial relations as predicate-argument structures. Learning the mapping between sentences and the predicate-argument representations of the situations they describe results in the development of a small lexicon, and a structured set of sentence form-to-meaning mappings, or simplified grammatical constructions. The acquired grammatical construction knowledge generalizes, allowing the system to correctly understand new sentences not used in training. In the context of discourse, the grammatical constructions are used in the inverse sense to generate sentences from meanings, allowing the system to describe visual scenes that it perceives. In question and answer dialogs with naive users the system exploits pragmatic cues in order to select grammatical constructions that are most relevant in the discourse structure. While the system embodies a number of limitations that are discussed, this research demonstrates how concepts borrowed from the construction grammar framework can aid in taking initial steps towards building systems that can acquire and produce event language through interaction with the world.