An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Spoken dialogue technology: enabling the conversational user interface
ACM Computing Surveys (CSUR)
Recognition of Affective Communicative Intent in Robot-Directed Speech
Autonomous Robots
Baby ears: a recognition system for affective vocalizations
Speech Communication
Emotional speech: towards a new generation of databases
Speech Communication - Special issue on speech and emotion
Applying an analysis of acted vocal emotions to improve the simulation of synthetic speech
Computer Speech and Language
User Modeling and User-Adapted Interaction
Speech Emotion Classification Using Machine Learning Algorithms
ICSC '08 Proceedings of the 2008 IEEE International Conference on Semantic Computing
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Emotion recognition from speech signals using new harmony features
Signal Processing
Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies
IEEE Transactions on Affective Computing
Emotion recognition using a hierarchical binary decision tree approach
Speech Communication
AVEC 2011-the first international audio/visual emotion challenge
ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
A comparison of methods for multiclass support vector machines
IEEE Transactions on Neural Networks
Hi-index | 0.00 |
The recognition of emotion from speech acoustics is an important problem in human-machine interaction, with many potential applications. In this paper, we first compare four ways to extend binary support vector machines (SVMs) to multiclass classification for recognising emotions from speech-namely two standard SVM schemes (one-versus-one and one-versus-rest) and two other methods (DAG and UDT) that form a hierarchy of classifiers, each making a distinct binary decision about class membership. These are trained and tested using 6552 features per speech sample extracted from three databases of acted emotional speech (DES, Berlin and Serbian) and a database of spontaneous speech (FAU Aibo Emotion Corpus) using the OpenEAR toolkit. Analysis of the errors made by these classifiers leads us to apply non-metric multi-dimensional scaling (NMDS) to produce a compact (two-dimensional) representation of the data suitable for guiding the choice of decision hierarchy. This representation can be interpreted in terms of the well-known valence-arousal model of emotion. We find that this model does not give a particularly good fit to the data: although the arousal dimension can be identified easily, valence is not well represented in the transformed data. We describe a new hierarchical classification technique whose structure is based on NMDS, which we call Data-Driven Dimensional Emotion Classification (3DEC). This new method is compared with the best of the four classifiers studied earlier and a state-of-the-art classification method on all four databases. We find no significant difference between these three approaches with respect to speaker-dependent performance. However, for the much more interesting and important case of speaker-independent emotion classification, 3DEC significantly outperforms the competitors.