Fusing eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue

Authors:
Zahar Prasov;Joyce Y. Chai
Affiliations:
Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI
Venue:
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Year:
2010

Citing 10
Cited 3

Where to look: a study of human-robot engagement

Proceedings of the 9th international conference on Intelligent user interfaces
Visual Salience and Reference Resolution in Simulated 3-D Environments

Artificial Intelligence Review
Conversing with the user based on eye-gaze patterns

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Towards a model of face-to-face grounding

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Recognizing gaze aversion gestures in embodied conversational discourse

Proceedings of the 8th international conference on Multimodal interfaces
What's in a gaze?: the role of eye-gaze in reference resolution in multimodal conversational interfaces

Proceedings of the 13th international conference on Intelligent user interfaces
Breaking the Ice in Human-Agent Communication: Eye-Gaze Based Initiation of Contact with an Embodied Conversational Agent

IVA '09 Proceedings of the 9th International Conference on Intelligent Virtual Agents
Between linguistic attention and gaze fixations inmultimodal conversational interfaces

Proceedings of the 2009 international conference on Multimodal interfaces
Context-based word acquisition for situated dialogue in a virtual world

Journal of Artificial Intelligence Research
Utilizing visual attention for cross-modal coreference interpretation

CONTEXT'05 Proceedings of the 5th international conference on Modeling and Using Context

Shared gaze in remote spoken hri during distributed military operation

HRI '12 Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
Integrating word acquisition and referential grounding towards physical world interaction

Proceedings of the 14th ACM international conference on Multimodal interaction
Towards mediating shared perceptual basis in situated dialogue

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue

Quantified Score

Hi-index	0.00

Visualization

Abstract

In situated dialogue humans often utter linguistic expressions that refer to extralinguistic entities in the environment. Correctly resolving these references is critical yet challenging for artificial agents partly due to their limited speech recognition and language understanding capabilities. Motivated by psycholinguistic studies demonstrating a tight link between language production and human eye gaze, we have developed approaches that integrate naturally occurring human eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue in a virtual world. In addition to incorporating eye gaze with the best recognized spoken hypothesis, we developed an algorithm to also handle multiple hypotheses modeled as word confusion networks. Our empirical results demonstrate that incorporating eye gaze with recognition hypotheses consistently outperforms the results obtained from processing recognition hypotheses alone. Incorporating eye gaze with word confusion networks further improves performance.