Emotion Classification of Audio Signals Using Ensemble of Support Vector Machines

Authors:
Taner Danisman;Adil Alpkocak
Affiliations:
Computer Engineering Department, Dokuz Eylul University, Izmir, Turkey 35160;Computer Engineering Department, Dokuz Eylul University, Izmir, Turkey 35160
Venue:
PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
Year:
2008

Citing 8
Cited 2

Bagging predictors

Machine Learning
Making large-scale support vector machine learning practical

Advances in kernel methods
Ensembling neural networks: many could be better than all

Artificial Intelligence
Two-stage Classification of Emotional Speech

ICDT '06 Proceedings of the international conference on Digital Telecommunications
An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech

Speech Communication
A study on speech with manifest emotions

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
New frameworks to boost feature selection algorithms in emotion detection for improved human-computer interaction

BVAI'07 Proceedings of the 2nd international conference on Advances in brain, vision and artificial intelligence
Employing fujisaki's intonation model parameters for emotion recognition

SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence

Recognition of vocal emotions from acoustic profile

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Distant supervision for emotion classification with discrete binary values

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study presents an approach for emotion classification of speech utterances based on ensemble of support vector machines. We considered feature level fusion of the MFCC, total energy and F0 as input feature vectors, and choose bagging method for the classification. Additionally, we also present a new emotional dataset based on a popular animation film, Finding Nemo where emotions are much emphasized to attract attention of spectators. Speech utterances are directly extracted from video audio channel including all background noise. Totally 2054 utterances from 24 speakers were annotated by a group of volunteers based on seven emotion categories. We concentrated on perceived emotion. Our approach has been tested on our newly developed dataset besides publically available datasets of DES and EmoDB. Experiments showed that our approach achieved 77.5% and 66.8% overall accuracy for four and five class classification on EFN dataset respectively. In addition, we achieved 67.6% accuracy on DES (five classes) and 63.5% on EmoDB (seven classes) dataset using ensemble of SVM's with 10 fold cross-validation.