Understanding complex visually referring utterances

Authors:
Peter Gorniak;Deb Roy
Affiliations:
Cognitive Machines Group, MIT Media Laboratory;Cognitive Machines Group, MIT Media Laboratory
Venue:
HLT-NAACL-LWM '04 Proceedings of the HLT-NAACL 2003 workshop on Learning word meaning from non-linguistic data - Volume 6
Year:
2003

Citing 4
Cited 1

The generative lexicon

Computational Linguistics
Natural language understanding (2nd ed.)

Natural language understanding (2nd ed.)
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic

Journal of Artificial Intelligence Research

Generating Referring Expressions that Involve Gradable Properties

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a computational model of visually-grounded spatial language understanding, based on a study of how people verbally describe objects in visual scenes. We describe our implementation of word level visually-grounded semantics and their embedding in a compositional parsing framework. The implemented system selects the correct referents in response to a broad range of referring expressions for a large percentage of test cases. In an analysis of the system's successes and failures we reveal how visual context influences the semantics of utterances and propose future extensions to the model that take such context into account.