Feature Analysis and Evaluation for Automatic Emotion Identification in Speech

Authors:
I. Luengo;E. Navas;I. Hernáez
Affiliations:
Dept. of Electron. & Telecommun., Univ. of the Basque Country, Bilbao, Spain;-;-
Venue:
IEEE Transactions on Multimedia
Year:
2010

Citing 0
Cited 6

Statistical analysis of complementary spectral features of emotional speech in Czech and Slovak

TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
Dimensionality reduction and classification analysis on the audio section of the SEMAINE database

ACII'11 Proceedings of the 4th international conference on Affective computing and intelligent interaction - Volume Part II
Duration modeling for emotional speech

ICICA'12 Proceedings of the Third international conference on Information Computing and Applications
Actor level emotion magnitude prediction in text and speech

Multimedia Tools and Applications
Comparison of complementary spectral features of emotional speech for german, czech, and slovak

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech

Proceedings of the 15th ACM on International conference on multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

The definition of parameters is a crucial step in the development of a system for identifying emotions in speech. Although there is no agreement on which are the best features for this task, it is generally accepted that prosody carries most of the emotional information. Most works in the field use some kind of prosodic features, often in combination with spectral and voice quality parametrizations. Nevertheless, no systematic study has been done comparing these features. This paper presents the analysis of the characteristics of features derived from prosody, spectral envelope, and voice quality as well as their capability to discriminate emotions. In addition, early fusion and late fusion techniques for combining different information sources are evaluated. The results of this analysis are validated with experimental automatic emotion identification tests. Results suggest that spectral envelope features outperform the prosodic ones. Even when different parametrizations are combined, the late fusion of long-term spectral statistics with short-term spectral envelope parameters provides an accuracy comparable to that obtained when all parametrizations are combined.