Fusion of Acoustic and Linguistic Features for Emotion Detection

  • Authors:
  • Florian Metze;Tim Polzehl;Michael Wagner

  • Affiliations:
  • -;-;-

  • Venue:
  • ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a system that deploys acoustic and linguistic information from speech in order to decide whether the utterance contains negative or non-negative meaning. An earlier version of this system was Passed QA to the Interspeech-2009 Emotion Challenge evaluation. The speech data consist of short utterances of the children’s speech, and the proposed system is designed to detect anger in each given chunk. Various frame-based cepstral, prosodic and acoustic features are extracted automatically and classified by means of a support vector machine. An automatic speech recognizer transcribes the utterances and yields a separate classification, based on the degree of emotional salience of the words. The emotionally salient words are computed on word hypotheses, so that un-transcribed training data is sufficient. Late fusion is applied to make a final decision on anger vs. non-anger of the utterance.