IEEE Transactions on Pattern Analysis and Machine Intelligence
Analysis of emotion recognition using facial expressions, speech and multimodal information
Proceedings of the 6th international conference on Multimodal interfaces
A first evaluation study of a database of kinetic facial expressions (DaFEx)
ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Toward multimodal fusion of affective cues
Proceedings of the 1st ACM international workshop on Human-centered multimedia
Audio-visual emotion recognition in adult attachment interview
Proceedings of the 8th international conference on Multimodal interfaces
Modeling naturalistic affective states via facial and vocal expressions recognition
Proceedings of the 8th international conference on Multimodal interfaces
ENCARA2: Real-time detection of multiple faces at different resolutions in video streams
Journal of Visual Communication and Image Representation
EmoVoice -- A Framework for Online Recognition of Emotions from Voice
PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation and Discussion of Multi-modal Emotion Recognition
ICCEE '09 Proceedings of the 2009 Second International Conference on Computer and Electrical Engineering - Volume 01
Hi-index | 0.00 |
Recognition of emotions from multimodal cues is of basic interest for the design of many adaptive interfaces in human-machine interaction (HMI) in general and human-robot interaction (HRI) in particular. It provides a means to incorporate non-verbal feedback in the course of interaction. Humans express their emotional and affective state rather unconsciously exploiting their different natural communication modalities such as body language, facial expression and prosodic intonation. In order to achieve applicability in realistic HRI settings, we develop person-independent affective models. In this paper, we present a study on multimodal recognition of emotions from such auditive and visual cues for interaction interfaces. We recognize six classes of basic emotions plus the neutral one of talking persons. The focus hereby lies on the simultaneous online visual and accoustic analysis of speaking faces. A probabilistic decision level fusion scheme based on Bayesian networks is applied to draw benefit of the complementary information from both - the acoustic and the visual - cues. We compare the performance of our state of the art recognition systems for separate modalities to the improved results after applying our fusion scheme on both DaFEx database and a real-life data that captured directly from robot. We furthermore discuss the results with regard to the theoretical background and future applications.