Beyond attention: the role of deictic gesture in intention recognition in multimodal conversational interfaces

Authors:
Shaolin Qu;Joyce Y. Chai
Affiliations:
Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI
Venue:
Proceedings of the 13th international conference on Intelligent user interfaces
Year:
2008

Citing 21
Cited 2

Attention, intentions, and the structure of discourse

Computational Linguistics
C4.5: programs for machine learning

C4.5: programs for machine learning
Automatic referent resolution of deictic and anaphoric expressions

Computational Linguistics
Support-Vector Networks

Machine Learning
QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Natural language with integrated deictic and graphic gestures

Readings in intelligent user interfaces
Mutual disambiguation of recognition errors in a multimodel architecture

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
A probabilistic approach to reference resolution in multimodal user interfaces

Proceedings of the 9th international conference on Intelligent user interfaces
multithreaded context for robust conversational interfaces: Context-sensitive speech recognition and interpretation of corrective fragments

ACM Transactions on Computer-Human Interaction (TOCHI)
Conversing with the user based on eye-gaze patterns

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
MATCH: an architecture for multimodal dialogue systems

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Intelligent Human-Machine Interaction Based on Dynamic Bayesian Networks Probabilistic Intention Recognition

Journal of Intelligent and Robotic Systems
Salience modeling based on non-verbal modalities for spoken language understanding

Proceedings of the 8th international conference on Multimodal interfaces
Combining acoustic and pragmatic features to predict recognition performance in spoken dialogue systems

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A salience driven approach to robust input interpretation in multimodal conversational systems

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Multimodal interactive maps: designing for human performance

Human-Computer Interaction
Cognitive principles in robust multimodal interpretation

Journal of Artificial Intelligence Research
Sphinx-4: a flexible open source framework for speech recognition

Sphinx-4: a flexible open source framework for speech recognition
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Improving pronominal and deictic co-reference resolution with multi-modal features

SIGDIAL '11 Proceedings of the SIGDIAL 2011 Conference
Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a multimodal conversational interface supporting speech and deictic gesture, deictic gestures on the graphical display have been traditionally used to identify user attention, for example, through reference resolution. Since the context of the identified attention can potentially constrain the associated intention, our hypothesis is that deictic gestures can go beyond attention and apply to intention recognition. Driven by this assumption, this paper systematically investigates the role of deictic gestures in intention recognition. We experiment with different model-based methods and instancebased methods to incorporate gestural information for intention recognition. We examine the effects of utilizing gestural information in two different processing stages: speech recognition stage and language understanding stage. Our empirical results have shown that utilizing gestural information improves intention recognition. The performance is further improved when gestures are incorporated in both speech recognition and language understanding stages compared to either stage alone.