The Journal of Machine Learning Research
A multimodal learning interface for grounding spoken language in sensory perceptions
ACM Transactions on Applied Perception (TAP)
Talking through procedures: an intelligent space station procedure assistant
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 2
Towards a model of face-to-face grounding
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Salience modeling based on non-verbal modalities for spoken language understanding
Proceedings of the 8th international conference on Multimodal interfaces
Proceedings of the 13th international conference on Intelligent user interfaces
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Utilizing visual attention for cross-modal coreference interpretation
CONTEXT'05 Proceedings of the 5th international conference on Modeling and Using Context
Context-based word acquisition for situated dialogue in a virtual world
Journal of Artificial Intelligence Research
Proceedings of the 2010 workshop on Eye gaze in intelligent human machine interaction
Proceedings of the 2010 workshop on Eye gaze in intelligent human machine interaction
Gaze and turn-taking behavior in casual conversational interactions
ACM Transactions on Interactive Intelligent Systems (TiiS) - Special issue on interaction with smart objects, Special section on eye gaze and conversation
Hi-index | 0.00 |
Motivated by the psycholinguistic finding that human eye gaze is tightly linked to speech production, previous work has applied naturally occurring eye gaze for automatic vocabulary acquisition. However, unlike in the typical settings for psycholinguistic studies, eye gaze can serve different functions in human-machine conversation. Some gaze streams do not link to the content of the spoken utterances and thus can be potentially detrimental to word acquisition. To address this problem, this paper investigates the incorporation of interactivity in identifying the close coupling of speech and gaze streams for word acquisition. Our empirical results indicate that automatic identification of closely coupled gaze-speech streams leads to significantly better word acquisition performance.