Automatic Analysis of Facial Expressions: The State of the Art
IEEE Transactions on Pattern Analysis and Machine Intelligence
Facial expression recognition from video sequences: temporal and static modeling
Computer Vision and Image Understanding - Special issue on Face recognition
Robust Real-Time Face Detection
International Journal of Computer Vision
Bimodal HCI-related affect recognition
Proceedings of the 6th international conference on Multimodal interfaces
Predicting user reactions to system error
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Identifying user corrections automatically in spoken dialogue systems
NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Audio-visual based emotion recognition-a new approach
CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition
Adaptive view-based appearance models
CVPR'03 Proceedings of the 2003 IEEE computer society conference on Computer vision and pattern recognition
Hi-index | 0.00 |
Automatic detection of communication errors in conversational systems has been explored extensively in the speech community. However, most previous studies have used only acoustic cues. Visual information has also been used by the speech community to improve speech recognition in dialogue systems, but this visual information is only used when the speaker is communicating vocally. A recent perceptual study indicated that human observers can detect communication problems when they see the visual footage of the speaker during the system's reply. In this paper, we present work in progress towards the development of a communication error detector that exploits this visual cue. In datasets we collected or acquired, facial motion features and head poses were estimated while users were listening to the system response and passed to a classifier for detecting a communication error. Preliminary experiments have demonstrated that the speaker's visual information during the system's reply is potentially useful and accuracy of automatic detection is close to human performance.