Cognitive principles in robust multimodal interpretation

Authors:
Joyce Y. Chai;Zahar Prasov;Shaolin Qu
Affiliations:
Department of Computer Science and Engineering, Michigan State University, East Lansing, MI;Department of Computer Science and Engineering, Michigan State University, East Lansing, MI;Department of Computer Science and Engineering, Michigan State University, East Lansing, MI
Venue:
Journal of Artificial Intelligence Research
Year:
2006

Citing 30
Cited 3

Attention, intentions, and the structure of discourse

Computational Linguistics
Intelligent multi-media interface technology

Intelligent user interfaces
ALFRESCO: Enjoying the combination of natural language processing and hypermedia for information exploration

Intelligent multimedia interfaces
Integrating simultaneous input from speech, gaze, and hand gestures

Intelligent multimedia interfaces
Automatic referent resolution of deictic and anaphoric expressions

Computational Linguistics
Multimodal interfaces for dynamic interactive maps

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A Graduated Assignment Algorithm for Graph Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Natural language with integrated deictic and graphic gestures

Readings in intelligent user interfaces
User and discourse models for multimodal communication

Readings in intelligent user interfaces
Embodiment in conversational interfaces: Rea

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Multimodal system processing in mobile environments

UIST '00 Proceedings of the 13th annual ACM symposium on User interface software and technology
Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Gesture Patterns during Speech Repairs

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Toward a theory of organized multimodal integration patterns during human-computer interaction

Proceedings of the 5th international conference on Multimodal interfaces
A probabilistic approach to reference resolution in multimodal user interfaces

Proceedings of the 9th international conference on Intelligent user interfaces
The pragmatics of referring and the modality of communication

Computational Linguistics
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Finite-state multimodal parsing and understanding

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
The CommandTalk spoken dialogue system

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Linguistic theories in efficient multimodal reference resolution: an empirical investigation

Proceedings of the 10th international conference on Intelligent user interfaces
Resolving pronominal reference to abstract entities

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Processes that shape conversation and their implications for computational linguistics

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Optimization in multimodal interpretation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Performance evaluation and error analysis for multimodal reference resolution in a conversation system

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Grounded semantic composition for visual scenes

Journal of Artificial Intelligence Research
Multimodal integration-a statistical view

IEEE Transactions on Multimedia

What's in a gaze?: the role of eye-gaze in reference resolution in multimodal conversational interfaces

Proceedings of the 13th international conference on Intelligent user interfaces
Beyond attention: the role of deictic gesture in intention recognition in multimodal conversational interfaces

Proceedings of the 13th international conference on Intelligent user interfaces
Ambiguity detection in multimodal systems

AVI '08 Proceedings of the working conference on Advanced visual interfaces

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimodal conversational interfaces provide a natural means for users to communicate with computer systems through multiple modalities such as speech and gesture. To build effective multimodal interfaces, automated interpretation of user multimodal inputs is important. Inspired by the previous investigation on cognitive status in multimodal human machine interaction, we have developed a greedy algorithm for interpreting user referring expressions (i.e., multimodal reference resolution). This algorithm incorporates the cognitive principles of Conversational Implicature and Givenness Hierarchy and applies constraints from various sources (e.g., temporal, semantic, and contextual) to resolve references. Our empirical results have shown the advantage of this algorithm in efficiently resolving a variety of user references. Because of its simplicity and generality, this approach has the potential to improve the robustness of multimodal input interpretation.