On the impact of children's emotional speech on acoustic and language models

  • Authors:
  • Stefan Steidl;Anton Batliner;Dino Seppi;Björn Schuller

  • Affiliations:
  • Lehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany;Lehrstuhl für Mustererkennung, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany;ESAT, Katholieke Universiteit Leuven, Heverlee, Leuven, Belgium;Institute for Human-Machine Communication, Technische Universität München, München, Germany

  • Venue:
  • EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The automatic recognition of children's speech is well known to be a challenge, and so is the influence of affect that is believed to downgrade performance of a speech recogniser. In this contribution, we investigate the combination of both phenomena. Extensive test runs are carried out for 1 k vocabulary continuous speech recognition on spontaneous motherese, emphatic, and angrychildren's speech as opposed to neutralspeech. The experiments address the question how specific emotions influence word accuracy. In a first scenario, "emotional" speech recognisers are compared to a speech recogniser trained on neutralspeech only. For this comparison, equal amounts of training data are used for each emotion-related state. In a second scenario, a "neutral" speech recogniser trained on large amounts of neutralspeech is adapted by adding only some emotionally coloured data in the training process. The results show that emphaticand angryspeech is recognised best--even better than neutralspeech--and that the performance can be improved further by adaptation of the acoustic and linguistic models. In order to show the variability of emotional speech, we visualise the distribution of the four emotion-related states in the MFCC space by applying a Sammon transformation.