Emotion recognition from speech by combining databases and fusion of classifiers

Authors:
Iulia Lefter;Leon J. M. Rothkrantz;Pascal Wiggers;David A. Van Leeuwen
Affiliations:
Delft University of Technology, The Netherlands and The Netherlands Defense Academy;Delft University of Technology, The Netherlands and The Netherlands Defense Academy;Delft University of Technology, The Netherlands;TNO Human Factors, The Netherlands
Venue:
TSD'10 Proceedings of the 13th international conference on Text, speech and dialogue
Year:
2010

Citing 5
Cited 4

The eNTERFACE'05 Audio-Visual Emotion Database

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Automatic Classification of Expressiveness in Speech: A Multi-corpus Study

Speaker Classification II
Automatic Recognition of Spontaneous Emotions in Speech Using Acoustic and Lexical Features

MLMI '08 Proceedings of the 5th international workshop on Machine Learning for Multimodal Interaction
Whodunnit - Searching for the most important feature types signalling emotion-related user states in speech

Computer Speech and Language
Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006

IEEE Transactions on Audio, Speech, and Language Processing

EmoReSp: an online emotion recognizer based on speech

Proceedings of the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing on International Conference on Computer Systems and Technologies
Addressing multimodality in overt aggression detection

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
A comparative study on automatic audio-visual fusion for aggression detection using meta-information

Pattern Recognition Letters
Towards estimating computer users' mood from interaction behaviour with keyboard and mouse

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore possibilities for enhancing the generality, portability and robustness of emotion recognition systems by combining data-bases and by fusion of classifiers. In a first experiment, we investigate the performance of an emotion detection system tested on a certain database given that it is trained on speech from either the same database, a different database or a mix of both. We observe that generally there is a drop in performance when the test database does not match the training material, but there are a few exceptions. Furthermore, the performance drops when a mixed corpus of acted databases is used for training and testing is carried out on real-life recordings. In a second experiment we investigate the effect of training multiple emotion detectors, and fusing these into a single detection system. We observe a drop in the Equal Error Rate (EER) from 19.0% on average for 4 individual detectors to 4.2% when fused using FoCal [1].