Fusion of audio- and visual cues for real-life emotional human robot interaction
DAGM'11 Proceedings of the 33rd international conference on Pattern recognition
A systematic discussion of fusion techniques for multi-modal affect recognition tasks
ICMI '11 Proceedings of the 13th international conference on multimodal interfaces
Proceedings of the 14th ACM international conference on Multimodal interaction
Hi-index | 0.00 |
Recognition of emotions from multimodal cues is of basic interest for the design of many adaptive interfaces in human-machine and human-robot interaction. It provides a means to incorporate non-verbal feedback in the interactional course. Humans express their emotional state rather unconsciously exploiting their different natural communication modalities. In this paper, we present a first study on multimodal recognition of emotions from auditive and visual cues for interaction interfaces. We recognize seven classes of basic emotions by means of visual analysis of talking faces. In parallel, the audio signal is analyzed on the basis of the intonation of the verbal articulation. We compare the performance of state of the art recognition systems on the DaFEx database for both complement modalities and discuss these results with regard to the theoretical background and possible fusion schemes in real-world multimodal interfaces.