Context-based word acquisition for situated dialogue in a virtual world

Authors:
Shaolin Qu;Joyce Y. Chai
Affiliations:
Department of Computer Science and Engineering, Michigan State University, East Lansing, MI;Department of Computer Science and Engineering, Michigan State University, East Lansing, MI
Venue:
Journal of Artificial Intelligence Research
Year:
2010

Citing 29
Cited 2

The use of eye movements in human-computer interaction techniques: what you look at is what you get

ACM Transactions on Information Systems (TOIS) - Special issue on computer—human interaction
Centering: a framework for modeling the local coherence of discourse

Computational Linguistics
A robust selection system using real-time multi-modal user-agent interactions

IUI '99 Proceedings of the 4th international conference on Intelligent user interfaces
Manual and gaze input cascaded (MAGIC) pointing

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Suede: a Wizard of Oz prototyping tool for speech user interfaces

UIST '00 Proceedings of the 13th annual ACM symposium on User interface software and technology
Embodied agents for multi-party dialogue in immersive virtual worlds

Proceedings of the first international joint conference on Autonomous agents and multiagent systems: part 2
Matching words and pictures

The Journal of Machine Learning Research
Where is "it"? Event Synchronization in Gaze-Speech Input Systems

Proceedings of the 5th international conference on Multimodal interfaces
Overriding errors in a speech and gaze multimodal architecture

Proceedings of the 9th international conference on Intelligent user interfaces
Using eye movements to determine referents in a spoken dialogue system

Proceedings of the 2001 workshop on Perceptive user interfaces
The mathematics of statistical machine translation: parameter estimation

Computational Linguistics - Special issue on using large corpora: II
A multimodal learning interface for grounding spoken language in sensory perceptions

ACM Transactions on Applied Perception (TAP)
Interaction challenges in human-robot space exploration

interactions - Robots!
Conversing with the user based on eye-gaze patterns

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Feature-rich part-of-speech tagging with a cyclic dependency network

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Towards a model of face-to-face grounding

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
What's in a gaze?: the role of eye-gaze in reference resolution in multimodal conversational interfaces

Proceedings of the 13th international conference on Intelligent user interfaces
Learning to sportscast: a test of grounded language acquisition

Proceedings of the 25th international conference on Machine learning
WordNet: similarity - measuring the relatedness of concepts

AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Incorporating temporal and semantic information with eye gaze for automatic word acquisition in multimodal conversational systems

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A framework for fast incremental interpretation during speech decoding

Computational Linguistics
Grounded semantic composition for visual scenes

Journal of Artificial Intelligence Research
Grounding the lexical semantics of verbs in visual perception using force dynamics and event logic

Journal of Artificial Intelligence Research
Between linguistic attention and gaze fixations inmultimodal conversational interfaces

Proceedings of the 2009 international conference on Multimodal interfaces
Learning semantic correspondences with less supervision

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Intentional context in situated natural language learning

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
The role of interactivity in human-machine conversation for automatic word acquisition

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Utilizing visual attention for cross-modal coreference interpretation

CONTEXT'05 Proceedings of the 5th international conference on Modeling and Using Context

Fusing eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Adaptive eye gaze patterns in interactions with human and artificial agents

ACM Transactions on Interactive Intelligent Systems (TiiS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

To tackle the vocabulary problem in conversational systems, previous work has applied unsupervised learning approaches on co-occurring speech and eye gaze during interaction to automatically acquire new words. Although these approaches have shown promise, several issues related to human language behavior and human-machine conversation have not been addressed. First, psycholinguistic studies have shown certain temporal regularities between human eye movement and language production. While these regularities can potentially guide the acquisition process, they have not been incorporated in the previous unsupervised approaches. Second, conversational systems generally have an existing knowledge base about the domain and vocabulary. While the existing knowledge can potentially help bootstrap and constrain the acquired new words, it has not been incorporated in the previous models. Third, eye gaze could serve different functions in human-machine conversation. Some gaze streams may not be closely coupled with speech stream, and thus are potentially detrimental to word acquisition. Automated recognition of closely-coupled speech-gaze streams based on conversation context is important. To address these issues, we developed new approaches that incorporate user language behavior, domain knowledge, and conversation context in word acquisition. We evaluated these approaches in the context of situated dialogue in a virtual world. Our experimental results have shown that incorporating the above three types of contextual information significantly improves word acquisition performance.