Attention, intentions, and the structure of discourse
Computational Linguistics
Centering: a framework for modeling the local coherence of discourse
Computational Linguistics
QuickSet: multimodal interaction for distributed applications
MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Mutual disambiguation of recognition errors in a multimodel architecture
Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Multimodal human discourse: gesture and speech
ACM Transactions on Computer-Human Interaction (TOCHI)
A centering approach to pronouns
ACL '87 Proceedings of the 25th annual meeting on Association for Computational Linguistics
MATCH: an architecture for multimodal dialogue systems
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Optimization in multimodal interpretation
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Proceedings of the 13th international conference on Intelligent user interfaces
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Utilizing visual attention for cross-modal coreference interpretation
CONTEXT'05 Proceedings of the 5th international conference on Modeling and Using Context
Context-based word acquisition for situated dialogue in a virtual world
Journal of Artificial Intelligence Research
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Syntactic surprisal affects spoken word duration in conversational contexts
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
In multimodal human machine conversation, successfully interpreting human attention is critical. While attention has been studied extensively in linguistic processing and visual processing, it is not clear how linguistic attention is aligned with visual attention in multimodal conversational interfaces. To address this issue, we conducted a preliminary investigation on how attention reflected by linguistic discourse aligns with attention indicated by gaze fixations during human machine conversation. Our empirical findings have shown that more attended entities based on linguistic discourse correspond to higher intensity of gaze fixations. The smoother a linguistic transition is, the less distance between corresponding fixation distributions. These findings provide insight into how language and gaze can be combined to predict attention, which have important implications in many tasks such as word acquisition and object recognition.