Detecting communication errors from visual cues during the system's conversational turn

Authors:
Sy Bor Wang;David Demirdjian;Trevor Darrell
Affiliations:
MIT CSAIL, Cambridge, MA;MIT CSAIL, Cambridge, MA;MIT CSAIL, Cambridge, MA
Venue:
Proceedings of the 9th international conference on Multimodal interfaces
Year:
2007

Citing 9
Cited 0

Automatic Analysis of Facial Expressions: The State of the Art

IEEE Transactions on Pattern Analysis and Machine Intelligence
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Communications of the ACM
Facial expression recognition from video sequences: temporal and static modeling

Computer Vision and Image Understanding - Special issue on Face recognition
Robust Real-Time Face Detection

International Journal of Computer Vision
Bimodal HCI-related affect recognition

Proceedings of the 6th international conference on Multimodal interfaces
Predicting user reactions to system error

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Identifying user corrections automatically in spoken dialogue systems

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Audio-visual based emotion recognition-a new approach

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Adaptive view-based appearance models

CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic detection of communication errors in conversational systems has been explored extensively in the speech community. However, most previous studies have used only acoustic cues. Visual information has also been used by the speech community to improve speech recognition in dialogue systems, but this visual information is only used when the speaker is communicating vocally. A recent perceptual study indicated that human observers can detect communication problems when they see the visual footage of the speaker during the system's reply. In this paper, we present work in progress towards the development of a communication error detector that exploits this visual cue. In datasets we collected or acquired, facial motion features and head poses were estimated while users were listening to the system response and passed to a classifier for detecting a communication error. Preliminary experiments have demonstrated that the speaker's visual information during the system's reply is potentially useful and accuracy of automatic detection is close to human performance.