Expression of affect in spontaneous speech: Acoustic correlates and automatic detection of irritation and resignation

  • Authors:
  • Petri Laukka;Daniel Neiberg;Mimmi Forsell;Inger Karlsson;Kjell Elenius

  • Affiliations:
  • Department of Psychology, Uppsala University, Uppsala, Sweden and Department of Education and Psychology, University of Gävle, Gävle, Sweden;Centre for Speech Technology, Department of Speech, Music and Hearing, KTH, Stockholm, Sweden;Centre for Speech Technology, Department of Speech, Music and Hearing, KTH, Stockholm, Sweden;Centre for Speech Technology, Department of Speech, Music and Hearing, KTH, Stockholm, Sweden;Centre for Speech Technology, Department of Speech, Music and Hearing, KTH, Stockholm, Sweden

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The majority of previous studies on vocal expression have been conducted on posed expressions. In contrast, we utilized a large corpus of authentic affective speech recorded from real-life voice controlled telephone services. Listeners rated a selection of 200 utterances from this corpus with regard to level of perceived irritation, resignation, neutrality, and emotion intensity. The selected utterances came from 64 different speakers who each provided both neutral and affective stimuli. All utterances were further automatically analyzed regarding a comprehensive set of acoustic measures related to F0, intensity, formants, voice source, and temporal characteristics of speech. Results first showed that several significant acoustic differences were found between utterances classified as neutral and utterances classified as irritated or resigned using a within-persons design. Second, listeners' ratings on each scale were associated with several acoustic measures. In general the acoustic correlates of irritation, resignation, and emotion intensity were similar to previous findings obtained with posed expressions, though the effect sizes were smaller for the authentic expressions. Third, automatic classification (using LDA classifiers both with and without speaker adaptation) of irritation, resignation, and neutral performed at a level comparable to human performance, though human listeners and machines did not necessarily classify individual utterances similarly. Fourth, clearly perceived exemplars of irritation and resignation were rare in our corpus. These findings were discussed in relation to future research.