Where to look: a study of human-robot engagement
Proceedings of the 9th international conference on Intelligent user interfaces
Visual Salience and Reference Resolution in Simulated 3-D Environments
Artificial Intelligence Review
Conversing with the user based on eye-gaze patterns
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Towards a model of face-to-face grounding
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Recognizing gaze aversion gestures in embodied conversational discourse
Proceedings of the 8th international conference on Multimodal interfaces
Proceedings of the 13th international conference on Intelligent user interfaces
IVA '09 Proceedings of the 9th International Conference on Intelligent Virtual Agents
Between linguistic attention and gaze fixations inmultimodal conversational interfaces
Proceedings of the 2009 international conference on Multimodal interfaces
Context-based word acquisition for situated dialogue in a virtual world
Journal of Artificial Intelligence Research
Utilizing visual attention for cross-modal coreference interpretation
CONTEXT'05 Proceedings of the 5th international conference on Modeling and Using Context
Shared gaze in remote spoken hri during distributed military operation
HRI '12 Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction
Integrating word acquisition and referential grounding towards physical world interaction
Proceedings of the 14th ACM international conference on Multimodal interaction
Towards mediating shared perceptual basis in situated dialogue
SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Hi-index | 0.00 |
In situated dialogue humans often utter linguistic expressions that refer to extralinguistic entities in the environment. Correctly resolving these references is critical yet challenging for artificial agents partly due to their limited speech recognition and language understanding capabilities. Motivated by psycholinguistic studies demonstrating a tight link between language production and human eye gaze, we have developed approaches that integrate naturally occurring human eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue in a virtual world. In addition to incorporating eye gaze with the best recognized spoken hypothesis, we developed an algorithm to also handle multiple hypotheses modeled as word confusion networks. Our empirical results demonstrate that incorporating eye gaze with recognition hypotheses consistently outperforms the results obtained from processing recognition hypotheses alone. Incorporating eye gaze with word confusion networks further improves performance.