Anger recognition in speech using acoustic and linguistic cues

Authors:
Tim Polzehl;Alexander Schmitt;Florian Metze;Michael Wagner
Affiliations:
Quality and Usability Lab, Technische Universität Berlin, Ernst-Reuter-Platz 7, D-10587 Berlin, Germany and Quality and Usability Lab, Deutsche Telekom Laboratories, Ernst-Reuter-Platz 7, D-1 ...;Dialogue Systems Group, University of Ulm, Albert-Einstein-Allee 43, D-89081 Ulm, Germany and Institute of Information Technology, University of Ulm, Albert-Einstein-Allee 43, D-89081 Ulm, Germany;Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA;National Centre for Biometric Studies, University of Canberra, ACT 2601, Australia
Venue:
Speech Communication
Year:
2011

Citing 8
Cited 3

Support-Vector Networks

Machine Learning
2005 Special Issue: Challenges in real-life emotion annotation and machine learning based detection

Neural Networks - Special issue: Emotion and brain
Psychoacoustics: Facts and Models

Psychoacoustics: Facts and Models
Recognizing low/high anger in speech for call centers

ISPRA'08 Proceedings of the 7th WSEAS International Conference on Signal Processing, Robotics and Automation
Detecting real life anger

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Fusion of Acoustic and Linguistic Features for Emotion Detection

ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
Class-level spectral features for emotion recognition

Speech Communication
Inter-labeler Agreement for Anger Detection in Interactive Voice Response Systems

IE '10 Proceedings of the 2010 Sixth International Conference on Intelligent Environments

A review of personality in voice-based man machine interaction

HCII'11 Proceedings of the 14th international conference on Human-computer interaction: interaction techniques and environments - Volume Part II
Application of nonlinear dynamics characterization to emotional speech

NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Nonlinear dynamics characterization of emotional speech

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The present study elaborates on the exploitation of both linguistic and acoustic feature modeling for anger classification. In terms of acoustic modeling we generate statistics from acoustic audio descriptors, e.g. pitch, loudness, spectral characteristics. Ranking our features we see that loudness and MFCC seem most promising for all databases. For the English database also pitch features are important. In terms of linguistic modeling we apply probabilistic and entropy-based models of words and phrases, e.g. Bag-of-Words (BOW), Term Frequency (TF), Term Frequency - Inverse Document Frequency (TF.IDF) and the Self-Referential Information (SRI). SRI clearly outperforms vector space models. Modeling phrases slightly improves the scores. After classification of both acoustic and linguistic information on separated levels we fuse information on decision level adding confidences. We compare the obtained scores on three different databases. Two databases are taken from the IVR customer care domain, another database accounts for a WoZ data collection. All corpora are of realistic speech condition. We observe promising results for the IVR databases while the WoZ database shows lower scores overall. In order to provide comparability between the results we evaluate classification success using the f1 measurement in addition to overall accuracy figures. As a result, acoustic modeling clearly outperforms linguistic modeling. Fusion slightly improves overall scores. With a baseline of approximately 60% accuracy and .40 f1-measurement by constant majority class voting we obtain an accuracy of 75% with respective .70 f1 for the WoZ database. For the IVR databases we obtain approximately 79% accuracy with respective .78 f1 over a baseline of 60% accuracy with respective .38 f1.