A simple method for resolution of definite reference in a shared visual context

Authors:
Alexander Siebert;David Schlangen
Affiliations:
Berlin-Brandenburgische, Akademie der Wissenschaften;University of Potsdam, Germany
Venue:
SIGdial '08 Proceedings of the 9th SIGdial Workshop on Discourse and Dialogue
Year:
2008

Citing 3
Cited 4

A corpus-based approach to language learning

A corpus-based approach to language learning
Computer Vision

Computer Vision
Grounded semantic composition for visual scenes

Journal of Artificial Intelligence Research

RUBISC: a robust unification-based incremental semantic chunker

SRSL '09 Proceedings of the 2nd Workshop on Semantic Representation of Spoken Language
Incremental reference resolution: the task, metrics for evaluation, and a Bayesian filtering model that is sensitive to disfluencies

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Markov logic networks for situated incremental natural language understanding

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Situated incremental natural language understanding using Markov Logic Networks

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for resolving definite exophoric reference to visually shared objects that is based on a) an automatically learned, simple mapping of words to visual features ("visual word semantics"), b) an automatically learned, semantically-motivated utterance segmentation ("visual grammar"), and c) a procedure that, given an utterance, uses b) to combine a) to yield a resolution. We evaluated the method both on a pre-recorded corpus and in an online setting, where it performed with 81% (chance: 14%) and 66% accuracy, respectively. This is comparable to results reported in related work on simpler settings.