Optimization in multimodal interpretation

Authors:
Joyce Y. Chai;Pengyu Hong;Michelle X. Zhou;Zahar Prasov
Affiliations:
Michigan State University, East Lansing, MI;Harvard University, Cambridge, MA;Intelligent Multimedia Interaction, Hawthorne, NY;Michigan State University, East Lansing, MI
Venue:
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Year:
2004

Citing 21
Cited 14

Intelligent multi-media interface technology

Intelligent user interfaces
Integrating simultaneous input from speech, gaze, and hand gestures

Intelligent multimedia interfaces
Multimodal interfaces for dynamic interactive maps

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
A Graduated Assignment Algorithm for Graph Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
User and discourse models for multimodal communication

Readings in intelligent user interfaces
Embodiment in conversational interfaces: Rea

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Context-Based Multimodal Input Understanding in Conversational Systems

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Toward a theory of organized multimodal integration patterns during human-computer interaction

Proceedings of the 5th international conference on Multimodal interfaces
A probabilistic approach to reference resolution in multimodal user interfaces

Proceedings of the 9th international conference on Intelligent user interfaces
Unification-based multimodal integration

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Efficient generation in primitive Optimality Theory

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Finite-state multimodal parsing and understanding

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
The CommandTalk spoken dialogue system

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
MATCH: an architecture for multimodal dialogue systems

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Performance evaluation and error analysis for multimodal reference resolution in a conversation system

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Multimodal integration-a statistical view

IEEE Transactions on Multimedia

Linguistic theories in efficient multimodal reference resolution: an empirical investigation

Proceedings of the 10th international conference on Intelligent user interfaces
Salience modeling based on non-verbal modalities for spoken language understanding

Proceedings of the 8th international conference on Multimodal interfaces
A salience driven approach to robust input interpretation in multimodal conversational systems

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
What's in a gaze?: the role of eye-gaze in reference resolution in multimodal conversational interfaces

Proceedings of the 13th international conference on Intelligent user interfaces
An integrative recognition method for speech and gestures

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Cognitive principles in robust multimodal interpretation

Journal of Artificial Intelligence Research
Individual and domain adaptation in sentence planning for dialogue

Journal of Artificial Intelligence Research
Gesture salience as a hidden variable for coreference resolution and keyframe extraction

Journal of Artificial Intelligence Research
Between linguistic attention and gaze fixations inmultimodal conversational interfaces

Proceedings of the 2009 international conference on Multimodal interfaces
Cross-modality semantic integration with hypothesis rescoring for robust interpretation of multimodal user interactions

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
Usage patterns and latent semantic analyses for task goal inference of multimodal user interactions

Proceedings of the 15th international conference on Intelligent user interfaces
Integrating word acquisition and referential grounding towards physical world interaction

Proceedings of the 14th ACM international conference on Multimodal interaction
Towards mediating shared perceptual basis in situated dialogue

SIGDIAL '12 Proceedings of the 13th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a multimodal conversation, the way users communicate with a system depends on the available interaction channels and the situated context (e.g., conversation focus, visual feedback). These dependencies form a rich set of constraints from various perspectives such as temporal alignments between different modalities, coherence of conversation, and the domain semantics. There is strong evidence that competition and ranking of these constraints is important to achieve an optimal interpretation. Thus, we have developed an optimization approach for multimodal interpretation, particularly for interpreting multimodal references. A preliminary evaluation indicates the effectiveness of this approach, especially for complex user inputs that involve multiple referring expressions in a speech utterance and multiple gestures.