Attention driven reference resolution in multimodal contexts

Authors:
J. D. Kelleher
Affiliations:
School of Computing, Dublin Institute of Technology, Dublin 8, Ireland
Venue:
Artificial Intelligence Review
Year:
2006

Citing 11
Cited 1

Attention, intentions, and the structure of discourse

Computational Linguistics
An algorithm for pronominal anaphora resolution

Computational Linguistics
Centering: a framework for modeling the local coherence of discourse

Computational Linguistics
Integration of visuospatial and linguistic information: language comprehension in real time and real space

Representation and processing of spatial expressions
Memory and Context for Language Interpretation

Memory and Context for Language Interpretation
Integration of Natural Language and Vision Processing

Integration of Natural Language and Vision Processing
Multimodal Cooperative Resolution of Referential Expressions in the DenK System

CMC '98 Revised Papers from the Second International Conference on Cooperative Multimodal Communication
The representation and use of focus in dialogue understanding.

The representation and use of focus in dialogue understanding.
Visual Salience and Reference Resolution in Simulated 3-D Environments

Artificial Intelligence Review
Dynamically structuring, updating and interrelating representations of visual and linguistic discourse context

Artificial Intelligence - Special volume on connecting language to the world
Grounded semantic composition for visual scenes

Journal of Artificial Intelligence Research

Using Probabilistic Feature Matching to Understand Spoken Descriptions

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years a number of psycholinguistic experiments have pointed to the interaction between language and vision. In particular, the interaction between visual attention and linguistic reference. In parallel with this, several theories of discourse have attempted to provide an account of the relationship between types of referential expressions on the one hand and the degree of mental activation on the other. Building on both of these traditions, this paper describes an attention based approach to visually situated reference resolution. The framework uses the relationship between referential form and preferred mode of interpretation as a basis for a weighted integration of linguistic and visual attention scores for each entity in the multimodal context. The resulting integrated attention scores are then used to rank the candidate referents during the resolution process, with the candidate scoring the highest selected as the referent. One advantage of this approach is that the resolution process occurs within the full multimodal context, in so far as the referent is selected from a full list of the objects in the multimodal context. As a result situations where the intended target of the reference is erroneously excluded, due to an individual assumption within the resolution process, are avoided. Moreover, the system can recognise situations where attention cues from different modalities make a reference potentially ambiguous.