Separating Style and Content with Bilinear Models
Neural Computation
Tied Factor Analysis for Face Recognition across Large Pose Differences
IEEE Transactions on Pattern Analysis and Machine Intelligence
Frame vs. Turn-Level: Emotion Recognition from Speech Considering Static and Dynamic Processing
ACII '07 Proceedings of the 2nd international conference on Affective Computing and Intelligent Interaction
Relative Speech Emotion Recognition Based Artificial Neural Network
PACIIA '08 Proceedings of the 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application - Volume 02
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Advances in Human-Computer Interaction - Special issue on emotion-aware natural interaction
Opensmile: the munich versatile and fast open-source audio feature extractor
Proceedings of the international conference on Multimedia
Cross-Corpus Acoustic Emotion Recognition: Variances and Strategies
IEEE Transactions on Affective Computing
A segmental speech model with applications to word spotting
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II
Vowels formants analysis allows straightforward detection of high arousal emotions
ICME '11 Proceedings of the 2011 IEEE International Conference on Multimedia and Expo
Analysis of Emotionally Salient Aspects of Fundamental Frequency for Emotion Detection
IEEE Transactions on Audio, Speech, and Language Processing
Speaker and Session Variability in GMM-Based Speaker Verification
IEEE Transactions on Audio, Speech, and Language Processing
Audio–Visual Affective Expression Recognition Through Multistream Fused HMM
IEEE Transactions on Multimedia
Exploring Cross-Modality Affective Reactions for Audiovisual Emotion Recognition
IEEE Transactions on Affective Computing
Hi-index | 0.00 |
Affect recognition is a crucial requirement for future human machine interfaces to effectively respond to nonverbal behaviors of the user. Speech emotion recognition systems analyze acoustic features to deduce the speaker's emotional state. However, human voice conveys a mixture of information including speaker, lexical, cultural, physiological and emotional traits. The presence of these communication aspects introduces variabilities that affect the performance of an emotion recognition system. Therefore, building robust emotional models requires careful considerations to compensate for the effect of these variabilities. This study aims to factorize speaker characteristics, verbal content and expressive behaviors in various acoustic features. The factorization technique consists in building phoneme level trajectory models for the features. We propose a metric to quantify the dependency between acoustic features and communication traits (i.e., speaker, lexical and emotional factors). This metric, which is motivated by the mutual information framework, estimates the uncertainty reduction in the trajectory models when a given trait is considered. The analysis provides important insights on the dependency between the features and the aforementioned factors. Motivated by these results, we propose a feature normalization technique based on the whitening transformation that aims to compensate for speaker and lexical variabilities. The benefit of employing this normalization scheme is validated with the presented factor analysis method. The emotion recognition experiments show that the normalization approach can attenuate the variability imposed by the verbal content and speaker identity, yielding 4.1% and 2.4% relative performance improvements on a selected set of features, respectively.