An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech

Authors:
Mohammad Shami;Werner Verhelst
Affiliations:
Laboratory for Digital Speech and Audio Processing, Department of ETRO-DSSP, Vrije Universiteit Brussel, Interdisciplinary Institute for Broadband Technology, Pleinlaan 2, 1050 Brussels, Belgium;Laboratory for Digital Speech and Audio Processing, Department of ETRO-DSSP, Vrije Universiteit Brussel, Interdisciplinary Institute for Broadband Technology, Pleinlaan 2, 1050 Brussels, Belgium
Venue:
Speech Communication
Year:
2007

Citing 7
Cited 18

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Recognition of Affective Communicative Intent in Robot-Directed Speech

Autonomous Robots
Baby ears: a recognition system for affective vocalizations

Speech Communication
How to find trouble in communication

Speech Communication - Special issue on speech and emotion
Emotions, speech and the ASR framework

Speech Communication - Special issue on speech and emotion
The production and recognition of emotions in speech: features and algorithms

International Journal of Human-Computer Studies - Application of affective computing in human—Computer interaction
Hidden Markov model-based speech emotion recognition

ICME '03 Proceedings of the 2003 International Conference on Multimedia and Expo - Volume 2

Recognizing low/high anger in speech for call centers

ISPRA'08 Proceedings of the 7th WSEAS International Conference on Signal Processing, Robotics and Automation
Emotion Classification of Audio Signals Using Ensemble of Support Vector Machines

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
Exploiting a Vowel Based Approach for Acted Emotion Recognition

Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction
Boosting selection of speech related features to improve performance of multi-class SVMs in emotion detection

Expert Systems with Applications: An International Journal
Spectrum Modification for Emotional Speech Synthesis

Multimodal Signals: Cognitive and Algorithmic Issues
Automatic Motherese Detection for Face-to-Face Interaction Analysis

Multimodal Signals: Cognitive and Algorithmic Issues
Automatic refinement of an expressive speech corpus assembling subjective perception and automatic classification

Speech Communication
Preliminary study of stress/neutral detection on recordings of children in the natural home environment

Proceedings of the 2nd Workshop on Child, Computer and Interaction
Automatic recognition of speech emotion using long-term spectro-temporal features

DSP'09 Proceedings of the 16th international conference on Digital Signal Processing
Combination of generative models and SVM based classifier for speech emotion recognition

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Expression of affect in spontaneous speech: Acoustic correlates and automatic detection of irritation and resignation

Computer Speech and Language
Automatic speech emotion recognition using modulation spectral features

Speech Communication
Supervised and semi-supervised infant-directed speech classification for parent-infant interaction analysis

Speech Communication
Relevance vector machine based speech emotion recognition

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Classification of emotional speech using 3DEC hierarchical classifier

Speech Communication
On the development of an automatic voice pleasantness classification and intensity estimation system

Computer Speech and Language
Dimensionality reduction-based spoken emotion recognition

Multimedia Tools and Applications
Class-specific multiple classifiers scheme to recognize emotions from speech signals

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this study, the robustness of approaches to the automatic classification of emotions in speech is addressed. Among the many types of emotions that exist, two groups of emotions are considered, adult-to-adult acted vocal expressions of common types of emotions like happiness, sadness, and anger and adult-to-infant vocal expressions of affective intents also known as ''motherese''. Specifically, we estimate the generalization capability of two feature extraction approaches, the approach developed for Sony's robotic dog AIBO (AIBO) and the segment-based approach (SBA) of [Shami, M., Kamel, M., 2005. Segment-based approach to the recognition of emotions in speech. In: IEEE Conf. on Multimedia and Expo (ICME05), Amsterdam, The Netherlands]. Three machine learning approaches are considered, K-nearest neighbors (KNN), Support vector machines (SVM) and Ada-boosted decision trees and four emotional speech databases are employed, Kismet, BabyEars, Danish, and Berlin databases. Single corpus experiments show that the considered feature extraction approaches AIBO and SBA are competitive on the four databases considered and that their performance is comparable with previously published results on the same databases. The best choice of machine learning algorithm seems to depend on the feature extraction approach considered. Multi-corpus experiments are performed with the Kismet-BabyEars and the Danish-Berlin database pairs that contain parallel emotional classes. Automatic clustering of the emotional classes in the database pairs shows that the patterns behind the emotions in the Kismet-BabyEars pair are less database dependent than the patterns in the Danish-Berlin pair. In off-corpus testing the classifier is trained on one database of a pair and tested on the other. This provides little improvement over baseline classification. In integrated corpus testing, however, the classifier is machine learned on the merged databases and this gives promisingly robust classification results, which suggest that emotional corpora with parallel emotion classes recorded under different conditions can be used to construct a single classifier capable of distinguishing the emotions in the merged corpora. Such a classifier is more robust than a classifier learned on a single corpus as it can recognize more varied expressions of the same emotional classes. These findings suggest that the existing approaches for the classification of emotions in speech are efficient enough to handle larger amounts of training data without any reduction in classification accuracy.