Improving generalisation and robustness of acoustic affect recognition

Authors:
Florian Eyben;Björn Schuller;Gerhard Rigoll
Affiliations:
Technische Universität München, Munich, Germany;JOANNEUM RESEARCH Forschungsgesellschaft mbH, Graz, Austria;Technische Universität München, Munich, Germany
Venue:
Proceedings of the 14th ACM international conference on Multimodal interaction
Year:
2012

Citing 6
Cited 0

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Being bored? Recognising natural interest by extensive audiovisual integration for real-life application

Image and Vision Computing
Performance analysis of acoustic emotion recognition for in-car conversational interfaces

UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: ambient interaction
Opensmile: the munich versatile and fast open-source audio feature extractor

Proceedings of the international conference on Multimedia
Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies

IEEE Transactions on Affective Computing
Affective speaker state analysis in the presence of reverberation

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emotion recognition in real-life conditions faces several challenging factors, which most studies on emotion recognition do not consider. Such factors include background noise, varying recording levels, and acoustic properties of the environment, for example. This paper presents a systematic evaluation of the influence of background noise of various types and SNRs, as well as recording level variations on the performance of automatic emotion recognition from speech. Both, natural and spontaneous as well as acted/prototypical emotions are considered. Besides the well known influence of additive noise, a significant influence of the recording level on the recognition performance is observed. Multi-condition learning with various noise types and recording levels is proposed as a way to increase robustness of methods based on standard acoustic feature sets and commonly used classifiers. It is compared to matched conditions learning and is found to be almost on par for many settings.