Attention, intentions, and the structure of discourse
Computational Linguistics
An algorithm for pronominal anaphora resolution
Computational Linguistics
Centering: a framework for modeling the local coherence of discourse
Computational Linguistics
Representation and processing of spatial expressions
Memory and Context for Language Interpretation
Memory and Context for Language Interpretation
Integration of Natural Language and Vision Processing
Integration of Natural Language and Vision Processing
Multimodal Cooperative Resolution of Referential Expressions in the DenK System
CMC '98 Revised Papers from the Second International Conference on Cooperative Multimodal Communication
The representation and use of focus in dialogue understanding.
The representation and use of focus in dialogue understanding.
Visual Salience and Reference Resolution in Simulated 3-D Environments
Artificial Intelligence Review
Artificial Intelligence - Special volume on connecting language to the world
Grounded semantic composition for visual scenes
Journal of Artificial Intelligence Research
Using Probabilistic Feature Matching to Understand Spoken Descriptions
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Hi-index | 0.00 |
In recent years a number of psycholinguistic experiments have pointed to the interaction between language and vision. In particular, the interaction between visual attention and linguistic reference. In parallel with this, several theories of discourse have attempted to provide an account of the relationship between types of referential expressions on the one hand and the degree of mental activation on the other. Building on both of these traditions, this paper describes an attention based approach to visually situated reference resolution. The framework uses the relationship between referential form and preferred mode of interpretation as a basis for a weighted integration of linguistic and visual attention scores for each entity in the multimodal context. The resulting integrated attention scores are then used to rank the candidate referents during the resolution process, with the candidate scoring the highest selected as the referent. One advantage of this approach is that the resolution process occurs within the full multimodal context, in so far as the referent is selected from a full list of the objects in the multimodal context. As a result situations where the intended target of the reference is erroneously excluded, due to an individual assumption within the resolution process, are avoided. Moreover, the system can recognise situations where attention cues from different modalities make a reference potentially ambiguous.