A salience driven approach to robust input interpretation in multimodal conversational systems

Authors:
Joyce Y. Chai;Shaolin Qu
Affiliations:
Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 20
Cited 5

Self-organized language modeling for speech recognition

Readings in speech recognition
Class-based n-gram models of natural language

Computational Linguistics
Automatic referent resolution of deictic and anaphoric expressions

Computational Linguistics
An algorithm for pronominal anaphora resolution

Computational Linguistics
Centering: a framework for modeling the local coherence of discourse

Computational Linguistics
QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Multimodal conversational systems for automobiles

Communications of the ACM - Multimodal interfaces that flex, adapt, and persist
A probabilistic approach to reference resolution in multimodal user interfaces

Proceedings of the 9th international conference on Intelligent user interfaces
The effectiveness of corpus-induced dependency grammars for post-processing speech

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Linguistic theories in efficient multimodal reference resolution: an empirical investigation

Proceedings of the 10th international conference on Intelligent user interfaces
MATCH: an architecture for multimodal dialogue systems

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Using model-theoretic semantic interpretation to guide statistical parsing and word recognition in a spoken language interface

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
The SuperARV language model: investigating the effectiveness of tightly integrating multiple knowledge sources

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Integration of speech recognition and natural language processing in the MIT VOYAGER system

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Optimization in multimodal interpretation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A class-based language model for large-vocabulary speech recognition extracted from part-of-speech statistics

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01
Utilizing visual attention for cross-modal coreference interpretation

CONTEXT'05 Proceedings of the 5th international conference on Modeling and Using Context

Salience modeling based on non-verbal modalities for spoken language understanding

Proceedings of the 8th international conference on Multimodal interfaces
Beyond attention: the role of deictic gesture in intention recognition in multimodal conversational interfaces

Proceedings of the 13th international conference on Intelligent user interfaces
Salience-driven Contextual Priming of Speech Recognition for Human-Robot Interaction

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
Gesture salience as a hidden variable for coreference resolution and keyframe extraction

Journal of Artificial Intelligence Research
A salience-driven approach to speech recognition for human-robot interaction

ESSLLI'08/09 Proceedings of the 2008 international conference on Interfaces: explorations in logic, language and computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

To improve the robustness in multimodal input interpretation, this paper presents a new salience driven approach. This approach is based on the observation that, during multimodal conversation, information from deictic gestures (e.g., point or circle) on a graphical display can signal a part of the physical world (i.e., representation of the domain and task) of the application which is salient during the communication. This salient part of the physical world will prime what users tend to communicate in speech and in turn can be used to constrain hypotheses for spoken language understanding, thus improving overall input interpretation. Our experimental results have indicated the potential of this approach in reducing word error rate and improving concept identification in multimodal conversation.