Class-level spectral features for emotion recognition

Authors:
Dmitri Bitouk;Ragini Verma;Ani Nenkova
Affiliations:
Department of Radiology, Section of Biomedical Image Analysis, University of Pennsylvania, 3600 Market Street, Suite 380, Philadelphia, PA 19104, United States;Department of Radiology, Section of Biomedical Image Analysis, University of Pennsylvania, 3600 Market Street, Suite 380, Philadelphia, PA 19104, United States;Department of Computer and Information Science, University of Pennsylvania, 3330 Walnut Street, Philadelphia, PA 19104, United States
Venue:
Speech Communication
Year:
2010

Citing 3
Cited 11

Modeling drivers' speech under stress

Speech Communication - Special issue on speech and emotion
Toward A Speaker-Independent Real-Time Affect Detection System

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Anger recognition in speech using acoustic and linguistic cues

Speech Communication
Emotional speech classification using hidden conditional random fields

Proceedings of the Second Symposium on Information and Communication Technology
Emotion recognition from speech: a review

International Journal of Speech Technology
Emotion recognition from speech using source, system, and prosodic features

International Journal of Speech Technology
Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

International Journal of Speech Technology
Paralinguistics in speech and language-State-of-the-art and the challenge

Computer Speech and Language
Combining video, audio and lexical indicators of affect in spontaneous conversation via particle filtering

Proceedings of the 14th ACM international conference on Multimodal interaction
Ten recent trends in computational paralinguistics

COST'11 Proceedings of the 2011 international conference on Cognitive Behavioural Systems
Characterization and recognition of emotions from speech using excitation source information

International Journal of Speech Technology
Class-specific multiple classifiers scheme to recognize emotions from speech signals

Computer Speech and Language
Exploiting Psychological Factors for Interaction Style Recognition in Spoken Conversation

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The most common approaches to automatic emotion recognition rely on utterance-level prosodic features. Recent studies have shown that utterance-level statistics of segmental spectral features also contain rich information about expressivity and emotion. In our work we introduce a more fine-grained yet robust set of spectral features: statistics of Mel-Frequency Cepstral Coefficients computed over three phoneme type classes of interest - stressed vowels, unstressed vowels and consonants in the utterance. We investigate performance of our features in the task of speaker-independent emotion recognition using two publicly available datasets. Our experimental results clearly indicate that indeed both the richer set of spectral features and the differentiation between phoneme type classes are beneficial for the task. Classification accuracies are consistently higher for our features compared to prosodic or utterance-level spectral features. Combination of our phoneme class features with prosodic features leads to even further improvement. Given the large number of class-level spectral features, we expected feature selection will improve results even further, but none of several selection methods led to clear gains. Further analyses reveal that spectral features computed from consonant regions of the utterance contain more information about emotion than either stressed or unstressed vowel features. We also explore how emotion recognition accuracy depends on utterance length. We show that, while there is no significant dependence for utterance-level prosodic features, accuracy of emotion recognition using class-level spectral features increases with the utterance length.