The GMM-SVM Supervector Approach for the Recognition of the Emotional Status from Speech

Authors:
Friedhelm Schwenker;Stefan Scherer;Yasmine M. Magdi;Günther Palm
Affiliations:
Institute of Neural Information Processing, University of Ulm, Ulm, Germany 89069;Institute of Neural Information Processing, University of Ulm, Ulm, Germany 89069;Computer Science and Engineering Department, German University in Cairo, Heliopolis, Egypt 11341;Institute of Neural Information Processing, University of Ulm, Ulm, Germany 89069
Venue:
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part I
Year:
2009

Citing 7
Cited 1

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Construction and Evaluation of a Robust Multifeature Speech/Music Discriminator

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
The production and recognition of emotions in speech: features and algorithms

International Journal of Human-Computer Studies - Application of affective computing in human—Computer interaction
2005 Special Issue: Emotion recognition in human-computer interaction

Neural Networks - Special issue: Emotion and brain
2005 Special Issue: Challenges in real-life emotion annotation and machine learning based detection

Neural Networks - Special issue: Emotion and brain
A tutorial on text-independent speaker verification

EURASIP Journal on Applied Signal Processing
Optimization of speaker verification using adapted Gaussian mixture models for high quality databases

SPPR'07 Proceedings of the Fourth conference on IASTED International Conference: Signal Processing, Pattern Recognition, and Applications

Multiple classifier systems for the classificatio of audio-visual emotional states

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Emotion recognition from speech is an important field of research in human-machine-interfaces, and has various applications, for instance for call centers. In the proposed classifier system RASTA-PLP features (perceptual linear prediction) are extracted from the speech signals. The first step is to compute an universal background model (UBM) representing a general structure of the underlying feature space of speech signals. This UBM is modeled as a Gaussian mixture model (GMM). After computing the UBM the sequence of feature vectors extracted from the utterance is used to re-train the UBM. From this GMM the mean vectors are extracted and concatenated to the so-called GMM supervectors which are then applied to a support vector machine classifier. The overall system has been evaluated by using utterances from the public Berlin emotional database. Utilizing the proposed features a recognition rate of 79% (utterance based) has been achieved which is close to the performance of humans on this database.