Between linguistic attention and gaze fixations inmultimodal conversational interfaces

Authors:
Rui Fang;Joyce Y. Chai;Fernanda Ferreira
Affiliations:
Michigan State University, East Lansing, MI, USA;Michigan State University, East Lansing, MI, USA;University of Edinburgh, Edinburgh, United Kingdom
Venue:
Proceedings of the 2009 international conference on Multimodal interfaces
Year:
2009

Citing 11
Cited 3

Attention, intentions, and the structure of discourse

Computational Linguistics
Centering: a framework for modeling the local coherence of discourse

Computational Linguistics
QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Multimodal human discourse: gesture and speech

ACM Transactions on Computer-Human Interaction (TOCHI)
A centering approach to pronouns

ACL '87 Proceedings of the 25th annual meeting on Association for Computational Linguistics
MATCH: an architecture for multimodal dialogue systems

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Optimization in multimodal interpretation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
What's in a gaze?: the role of eye-gaze in reference resolution in multimodal conversational interfaces

Proceedings of the 13th international conference on Intelligent user interfaces
Incorporating temporal and semantic information with eye gaze for automatic word acquisition in multimodal conversational systems

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Utilizing visual attention for cross-modal coreference interpretation

CONTEXT'05 Proceedings of the 5th international conference on Modeling and Using Context

Context-based word acquisition for situated dialogue in a virtual world

Journal of Artificial Intelligence Research
Fusing eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Syntactic surprisal affects spoken word duration in conversational contexts

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In multimodal human machine conversation, successfully interpreting human attention is critical. While attention has been studied extensively in linguistic processing and visual processing, it is not clear how linguistic attention is aligned with visual attention in multimodal conversational interfaces. To address this issue, we conducted a preliminary investigation on how attention reflected by linguistic discourse aligns with attention indicated by gaze fixations during human machine conversation. Our empirical findings have shown that more attended entities based on linguistic discourse correspond to higher intensity of gaze fixations. The smoother a linguistic transition is, the less distance between corresponding fixation distributions. These findings provide insight into how language and gaze can be combined to predict attention, which have important implications in many tasks such as word acquisition and object recognition.