Formant position based weighted spectral features for emotion recognition

Authors:
Elif Bozkurt;Engin Erzin;Çigdem Eroglu Erdem;A. Tanju Erdem
Affiliations:
Multimedia, Vision and Graphics Laboratory, College of Engineering, Koç University, 34450 Sariyer, Istanbul, Turkey;Multimedia, Vision and Graphics Laboratory, College of Engineering, Koç University, 34450 Sariyer, Istanbul, Turkey;Department of Electrical and Electronics Engineering, Bahçeşehir University, 34353 Beşiktaş, Istanbul, Turkey;Department of Electrical and Electronics Engineering, Özyegin University, 34662 ísküdar, Istanbul, Turkey
Venue:
Speech Communication
Year:
2011

Citing 9
Cited 2

On Combining Classifiers

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
How to find trouble in communication

Speech Communication - Special issue on speech and emotion
Robust and efficient quantization of speech LSP parameters using structured vector quantizers

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Ensemble methods for spoken emotion recognition in call-centres

Speech Communication
Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing

ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
A Survey of Affect Recognition Methods: Audio, Visual, and Spontaneous Expressions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multimodal speaker identification using an adaptive classifier cascade based on modality reliability

IEEE Transactions on Multimedia
Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis

IEEE Transactions on Multimedia

Comparison of complementary spectral features of emotional speech for german, czech, and slovak

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Speech emotional features extraction based on electroglottograph

Neural Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech. The idea is based on the fact that formant locations carry emotion-related information, and therefore critical spectral bands around formant locations can be emphasized during the calculation of MFCC features. The spectral weighting is derived from the normalized inverse harmonic mean function of the line spectral frequency (LSF) features, which are known to be localized around formant frequencies. The above approach can be considered as an early data fusion of spectral content and formant location information. We also investigate methods for late decision fusion of unimodal classifiers. We evaluate the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus. The results show that unimodal classifiers with the WMFCC features perform significantly better than the classifiers with standard spectral features. Late decision fusion of classifiers provide further significant performance improvements.