Emotional vocal expressions recognition using the COST 2102 italian database of emotional speech

Authors:
Hicham Atassi;Maria Teresa Riviello;Zdeněk Smékal;Amir Hussain;Anna Esposito
Affiliations:
Department of Computing Science and Mathematics, University of Stirling, UK;Department of Psychology and IIASS, Second University of Naples, Italy;Department of Telecommunications, Brno University of Technology, Czech Republic;Department of Computing Science and Mathematics, University of Stirling, UK;Department of Psychology and IIASS, Second University of Naples, Italy
Venue:
COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
Year:
2009

Citing 12
Cited 3

Vocal communication of emotion: a review of research paradigms

Speech Communication - Special issue on speech and emotion
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Affective Human-Robotic Interaction

Affect and Emotion in Human-Computer Interaction
Real-Time Emotion Recognition Using Echo State Networks

PIT '08 Proceedings of the 4th IEEE tutorial and research workshop on Perception and Interactive Technologies for Speech-Based Systems: Perception in Multimodal Dialogue Systems
A Speaker Independent Approach to the Classification of Emotional Vocal Expressions

ICTAI '08 Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence - Volume 02
Data Fusion at Different Levels

Multimodal Signals: Cognitive and Algorithmic Issues
The COST 2102 Italian Audio and Video Emotional Database

Proceedings of the 2009 conference on Neural Nets WIRN09: Proceedings of the 19th Italian Workshop on Neural Nets, Vietri sul Mare, Salerno, Italy, May 28--30 2009
Cultural Specific Effects on the Recognition of Basic Emotions: A Study on Italian Subjects

USAB '09 Proceedings of the 5th Symposium of the Workgroup Human-Computer Interaction and Usability Engineering of the Austrian Computer Society on HCI and Usability for e-Inclusion
The new italian audio and video emotional database

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
Text independent methods for speech segmentation

Nonlinear Speech Modeling and Applications
An objective and subjective study of the role of semantics and prosodic features in building corpora for emotional TTS

IEEE Transactions on Audio, Speech, and Language Processing

The new italian audio and video emotional database

COST'09 Proceedings of the Second international conference on Development of Multimodal Interfaces: active Listening and Synchrony
On the perception of emotional "Voices": a cross-cultural comparison among american, french and italian subjects

COST'10 Proceedings of the 2010 international conference on Analysis of Verbal and Nonverbal Communication and Enactment
Comparison of complementary spectral features of emotional speech for german, czech, and slovak

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The present paper proposes a new speaker-independent approach to the classification of emotional vocal expressions by using the COST 2102 Italian database of emotional speech. The audio records extracted from video clips of Italian movies possess a certain degree of spontaneity and are either noisy or slightly degraded by an interruption making the collected stimuli more realistic in comparison with available emotional databases containing utterances recorded under studio conditions. The audio stimuli represent 6 basic emotional states: happiness, sarcasm/irony, fear, anger, surprise, and sadness. For these more realistic conditions, and using a speaker independent approach, the proposed system is able to classify the emotions under examination with 60.7% accuracy by using a hierarchical structure consisting of a Perceptron and fifteen Gaussian Mixture Models (GMM) trained to distinguish within each pair (couple) of emotions under examination. The best features in terms of high discriminative power were selected by using the Sequential Floating Forward Selection (SFFS) algorithm among a large number of spectral, prosodic and voice quality features. The results were compared with the subjective evaluation of the stimuli provided by human subjects.