Performance evaluation and error analysis for multimodal reference resolution in a conversation system

Authors:
Joyce Y. Chai;Zahar Prasov;Pengyu Hong
Affiliations:
Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI;Harvard University Cambridge, MA
Venue:
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Year:
2004

Citing 15
Cited 3

Automatic referent resolution of deictic and anaphoric expressions

Computational Linguistics
A Graduated Assignment Algorithm for Graph Matching

IEEE Transactions on Pattern Analysis and Machine Intelligence
Integration and synchronization of input modes during multimodal human-computer interaction

Proceedings of the ACM SIGCHI Conference on Human factors in computing systems
QuickSet: multimodal interaction for distributed applications

MULTIMEDIA '97 Proceedings of the fifth ACM international conference on Multimedia
Natural language with integrated deictic and graphic gestures

Readings in intelligent user interfaces
Embodiment in conversational interfaces: Rea

Proceedings of the SIGCHI conference on Human Factors in Computing Systems
Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Gesture Patterns during Speech Repairs

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Context-Based Multimodal Input Understanding in Conversational Systems

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
A probabilistic approach to reference resolution in multimodal user interfaces

Proceedings of the 9th international conference on Intelligent user interfaces
Unification-based multimodal parsing

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Finite-state multimodal parsing and understanding

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
MATCH: an architecture for multimodal dialogue systems

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Grounded semantic composition for visual scenes

Journal of Artificial Intelligence Research

Linguistic theories in efficient multimodal reference resolution: an empirical investigation

Proceedings of the 10th international conference on Intelligent user interfaces
Optimization in multimodal interpretation

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Cognitive principles in robust multimodal interpretation

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multimodal reference resolution is a process that automatically identifies what users refer to during multimodal human-machine conversation. Given the substantial work on multimodal reference resolution; it is important to evaluate the current state of the art, understand the limitations, and identify directions for future improvement. We conducted a series of user studies to evaluate the capability of reference resolution in a multimodal conversation system. This paper analyzes the main error sources during real-time human-machine interaction and presents key strategies for designing robust multimodal reference resolution algorithms.