Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies

Authors:
Bjorn Schuller;Bogdan Vlasenko;Florian Eyben;Martin Wollmer;Andre Stuhlsatz;Andreas Wendemuth;Gerhard Rigoll
Affiliations:
Technische Universität München, München;Otto-von-Guericker Universität (OVGU), Magdeburg;Technische Universität München, München;Technische Universität München, München;University of Applied Sciences Düsseldorf, Düsseldorf;Otto-von-Guericker Universität (OVGU), Magdeburg;Technische Universität München, München
Venue:
IEEE Transactions on Affective Computing
Year:
2010

Citing 0
Cited 9

Affective speaker state analysis in the presence of reverberation

International Journal of Speech Technology
A multitask approach to continuous five-dimensional affect sensing in natural speech

ACM Transactions on Interactive Intelligent Systems (TiiS) - Special Issue on Affective Interaction in Natural Environments
Classification of emotional speech using 3DEC hierarchical classifier

Speech Communication
Paralinguistics in speech and language-State-of-the-art and the challenge

Computer Speech and Language
On the development of an automatic voice pleasantness classification and intensity estimation system

Computer Speech and Language
Improving generalisation and robustness of acoustic affect recognition

Proceedings of the 14th ACM international conference on Multimodal interaction
Ten recent trends in computational paralinguistics

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Compensating for speaker or lexical variabilities in speech for emotion recognition

Speech Communication
Modeling phonetic pattern variability in favor of the creation of robust emotion classifiers for real-life applications

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the recognition of emotion from speech has matured to a degree where it becomes applicable in real-life settings, it is time for a realistic view on obtainable performances. Most studies tend to overestimation in this respect: Acted data is often used rather than spontaneous data, results are reported on preselected prototypical data, and true speaker disjunctive partitioning is still less common than simple cross-validation. Even speaker disjunctive evaluation can give only a little insight into the generalization ability of today's emotion recognition engines since training and test data used for system development usually tend to be similar as far as recording conditions, noise overlay, language, and types of emotions are concerned. A considerably more realistic impression can be gathered by interset evaluation: We therefore show results employing six standard databases in a cross-corpora evaluation experiment which could also be helpful for learning about chances to add resources for training and overcoming the typical sparseness in the field. To better cope with the observed high variances, different types of normalization are investigated. 1.8 k individual evaluations in total indicate the crucial performance inferiority of inter to intracorpus testing.