Salience modeling based on non-verbal modalities for spoken language understanding

Authors:
Shaolin Qu;Joyce Y. Chai
Affiliations:
Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI
Venue:
Proceedings of the 8th international conference on Multimodal interfaces
Year:
2006

Citing 12
Cited 11

Class-based n-gram models of natural language

Computational Linguistics
Automatic referent resolution of deictic and anaphoric expressions

Computational Linguistics
An algorithm for pronominal anaphora resolution

Computational Linguistics
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
A probabilistic approach to reference resolution in multimodal user interfaces

Proceedings of the 9th international conference on Intelligent user interfaces
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Finite-state multimodal parsing and understanding

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Optimization in multimodal interpretation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A salience driven approach to robust input interpretation in multimodal conversational systems

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Sphinx-4: a flexible open source framework for speech recognition

Sphinx-4: a flexible open source framework for speech recognition

A model for multimodal representation and processing for reference resolution

Proceedings of the 2007 workshop on Multimodal interfaces in semantic interaction
Beyond attention: the role of deictic gesture in intention recognition in multimodal conversational interfaces

Proceedings of the 13th international conference on Intelligent user interfaces
An integrative recognition method for speech and gestures

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Knowledge and data flow architecture for reference processing in multimodal dialog systems

ICMI '08 Proceedings of the 10th international conference on Multimodal interfaces
Modeling and Using Salience in Multimodal Interaction Systems

Proceedings of the 13th International Conference on Human-Computer Interaction. Part II: Novel Interaction Methods and Techniques
Cross-modality semantic integration with hypothesis rescoring for robust interpretation of multimodal user interactions

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on multimodal processing in speech-based interactions
The role of interactivity in human-machine conversation for automatic word acquisition

SIGDIAL '09 Proceedings of the SIGDIAL 2009 Conference: The 10th Annual Meeting of the Special Interest Group on Discourse and Dialogue
Usage patterns and latent semantic analyses for task goal inference of multimodal user interactions

Proceedings of the 15th international conference on Intelligent user interfaces
The recognition and comprehension of hand gestures: a review and research agenda

ZiF'06 Proceedings of the Embodied communication in humans and machines, 2nd ZiF research group international conference on Modeling communication with robots and virtual humans
Utilizing gestures to improve sentence boundary detection

Multimedia Tools and Applications
Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous studies have shown that, in multimodal conversational systems, fusing information from multiple modalities together can improve the overall input interpretation through mutual disambiguation. Inspired by these findings, this paper investigates non-verbal modalities, in particular deictic gesture, in spoken language processing. Our assumption is that during multimodal conversation, user's deictic gestures on the graphic display can signal the underlying domain model that is salient at that particular point of interaction. This salient domain model can be used to constrain hypotheses for spoken language processing. Based on this assumption, this paper examines different configurations of salience driven language models (e.g., n-gram and probabilistic context free grammar) for spoken language processing across different stages. Our empirical results have shown the potential of integrating salience models based on non-verbal modalities in spoken language understanding.