Fusion of Acoustic and Linguistic Features for Emotion Detection

Authors:
Florian Metze;Tim Polzehl;Michael Wagner
Affiliations:
-;-;-
Venue:
ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
Year:
2009

Citing 0
Cited 3

Anger recognition in speech using acoustic and linguistic cues

Speech Communication
Emotion modeling from speech signal based on wavelet packet transform

International Journal of Speech Technology
Automatic detection of deceit in verbal communication

Proceedings of the 15th ACM on International conference on multimodal interaction

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a system that deploys acoustic and linguistic information from speech in order to decide whether the utterance contains negative or non-negative meaning. An earlier version of this system was Passed QA to the Interspeech-2009 Emotion Challenge evaluation. The speech data consist of short utterances of the children’s speech, and the proposed system is designed to detect anger in each given chunk. Various frame-based cepstral, prosodic and acoustic features are extracted automatically and classified by means of a support vector machine. An automatic speech recognizer transcribes the utterances and yields a separate classification, based on the degree of emotional salience of the words. The emotionally salient words are computed on word hypotheses, so that un-transcribed training data is sufficient. Late fusion is applied to make a final decision on anger vs. non-anger of the utterance.