A probabilistic approach to reference resolution in multimodal user interfaces

Authors:
Joyce Y. Chai;Pengyu Hong;Michelle X. Zhou
Affiliations:
Michigan State University, East Lansing, MI;Harvard University, Cambridge, MA;IBM T. J. Watson Research Center, Hawthorne, NY
Venue:
Proceedings of the 9th international conference on Intelligent user interfaces
Year:
2004

Citing 22
Cited 28

Attention, intentions, and the structure of discourse

Computational Linguistics
Intelligent multi-media interface technology

Intelligent user interfaces
ALFRESCO: Enjoying the combination of natural language processing and hypermedia for information exploration

Intelligent multimedia interfaces
Integrating simultaneous input from speech, gaze, and hand gestures

Intelligent multimedia interfaces
Automatic referent resolution of deictic and anaphoric expressions

Computational Linguistics
Multimodal interfaces for dynamic interactive maps

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A Graduated Assignment Algorithm for Graph Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Natural language with integrated deictic and graphic gestures

Readings in intelligent user interfaces
User and discourse models for multimodal communication

Readings in intelligent user interfaces
Embodiment in conversational interfaces: Rea

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Multimodal system processing in mobile environments

UIST '00 Proceedings of the 13th annual ACM symposium on User interface software and technology
Automated authoring of coherent multimedia discourse in conversation systems

MULTIMEDIA '01 Proceedings of the ninth ACM international conference on Multimedia
Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Context-Based Multimodal Input Understanding in Conversational Systems

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Unification-based multimodal integration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Finite-state multimodal parsing and understanding

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
The CommandTalk spoken dialogue system

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Multimodal model integration for sentence unit detection

Proceedings of the 6th international conference on Multimodal interfaces
Two-way adaptation for robust input interpretation in practical multimodal conversation systems

Proceedings of the 10th international conference on Intelligent user interfaces
Linguistic theories in efficient multimodal reference resolution: an empirical investigation

Proceedings of the 10th international conference on Intelligent user interfaces
Enabling context-sensitive information seeking

Proceedings of the 11th international conference on Intelligent user interfaces
Salience modeling based on non-verbal modalities for spoken language understanding

Proceedings of the 8th international conference on Multimodal interfaces
Optimization in multimodal interpretation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A salience driven approach to robust input interpretation in multimodal conversational systems

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Beyond attention: the role of deictic gesture in intention recognition in multimodal conversational interfaces

Proceedings of the 13th international conference on Intelligent user interfaces
Ambiguity detection in multimodal systems

AVI '08 Proceedings of the working conference on Advanced visual interfaces
References to graphical objects in interactive multimodal queries

Knowledge-Based Systems
Responsive information architect: enabling context-sensitive information seeking

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Modeling and Using Salience in Multimodal Interaction Systems

Proceedings of the 13th International Conference on Human-Computer Interaction. Part II: Novel Interaction Methods and Techniques
Performance evaluation and error analysis for multimodal reference resolution in a conversation system

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Cognitive principles in robust multimodal interpretation

Journal of Artificial Intelligence Research
Cross-modality semantic integration with hypothesis rescoring for robust interpretation of multimodal user interactions

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Usage patterns and latent semantic analyses for task goal inference of multimodal user interactions

Proceedings of the 15th international conference on Intelligent user interfaces
A multimodal dialogue mashup for medical image semantics

Proceedings of the 15th international conference on Intelligent user interfaces
The recognition and comprehension of hand gestures: a review and research agenda

ZiF'06 Proceedings of the Embodied communication in humans and machines, 2nd ZiF research group international conference on Modeling communication with robots and virtual humans
Utilizing gestures to improve sentence boundary detection

Multimedia Tools and Applications
Mudra: a unified multimodal interaction framework

ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
A multimodal reference resolution approach in virtual environment

VSMM'06 Proceedings of the 12th international conference on Interactive Technologies and Sociotechnical Systems
Multimodal architectures: issues and experiences

OTM'06 Proceedings of the 2006 international conference on On the Move to Meaningful Internet Systems: AWeSOMe, CAMS, COMINF, IS, KSinBIT, MIOS-CIAO, MONET - Volume Part I
Fusion in multimodal interactive systems: an HMM-based algorithm for user-induced adaptation

Proceedings of the 4th ACM SIGCHI symposium on Engineering interactive computing systems
Integrating word acquisition and referential grounding towards physical world interaction

Proceedings of the 14th ACM international conference on Multimodal interaction
Towards mediating shared perceptual basis in situated dialogue

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Review Article: Multimodal interaction: A review

Pattern Recognition Letters
Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Multimodal retrieval with relevance feedback based on genetic programming

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimodal user interfaces allow users to interact with computers through multiple modalities, such as speech, gesture, and gaze. To be effective, multimodal user interfaces must correctly identify all objects which users refer to in their inputs. To systematically resolve different types of references, we have developed a probabilistic approach that uses a graph-matching algorithm. Our approach identifies the most probable referents by optimizing the satisfaction of semantic, temporal, and contextual constraints simultaneously. Our preliminary user study results indicate that our approach can successfully resolve a wide variety of referring expressions, ranging from simple to complex and from precise to ambiguous ones.