Fusion of audio- and visual cues for real-life emotional human robot interaction

Authors:
Ahmad Rabie;Uwe Handmann
Affiliations:
Institute of Informatics, University of Applied Sciences/ HRW, Bottrop, Germany;Institute of Informatics, University of Applied Sciences/ HRW, Bottrop, Germany
Venue:
DAGM'11 Proceedings of the 33rd international conference on Pattern recognition
Year:
2011

Citing 10
Cited 0

Active Appearance Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Analysis of emotion recognition using facial expressions, speech and multimodal information

Proceedings of the 6th international conference on Multimodal interfaces
A first evaluation study of a database of kinetic facial expressions (DaFEx)

ICMI '05 Proceedings of the 7th international conference on Multimodal interfaces
Toward multimodal fusion of affective cues

Proceedings of the 1st ACM international workshop on Human-centered multimedia
Audio-visual emotion recognition in adult attachment interview

Proceedings of the 8th international conference on Multimodal interfaces
Modeling naturalistic affective states via facial and vocal expressions recognition

Proceedings of the 8th international conference on Multimodal interfaces
ENCARA2: Real-time detection of multiple faces at different resolutions in video streams

Journal of Visual Communication and Image Representation
EmoVoice -- A Framework for Online Recognition of Emotions from Voice

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Evaluation and Discussion of Multi-modal Emotion Recognition

ICCEE '09 Proceedings of the 2009 Second International Conference on Computer and Electrical Engineering - Volume 01

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recognition of emotions from multimodal cues is of basic interest for the design of many adaptive interfaces in human-machine interaction (HMI) in general and human-robot interaction (HRI) in particular. It provides a means to incorporate non-verbal feedback in the course of interaction. Humans express their emotional and affective state rather unconsciously exploiting their different natural communication modalities such as body language, facial expression and prosodic intonation. In order to achieve applicability in realistic HRI settings, we develop person-independent affective models. In this paper, we present a study on multimodal recognition of emotions from such auditive and visual cues for interaction interfaces. We recognize six classes of basic emotions plus the neutral one of talking persons. The focus hereby lies on the simultaneous online visual and accoustic analysis of speaking faces. A probabilistic decision level fusion scheme based on Bayesian networks is applied to draw benefit of the complementary information from both - the acoustic and the visual - cues. We compare the performance of our state of the art recognition systems for separate modalities to the improved results after applying our fusion scheme on both DaFEx database and a real-life data that captured directly from robot. We furthermore discuss the results with regard to the theoretical background and future applications.