Intelligibility rating with automatic speech recognition, prosodic, and cepstral evaluation

  • Authors:
  • Tino Haderlein;Cornelia Moers;Bernd Möbius;Frank Rosanowski;Elmar Nöth

  • Affiliations:
  • University of Erlangen-Nuremberg, Pattern Recognition Lab, Informatik 5, Erlangen, Germany and Department of Phoniatrics and Pedaudiology;University of Bonn, Department of Speech and Communication, Bonn, Germany;Saarland University, Department of Computational Linguistics and Phonetics, Saarbrücken, Germany;University of Erlangen-Nuremberg, Department of Phoniatrics and Pedaudiology, Erlangen, Germany;University of Erlangen-Nuremberg, Pattern Recognition Lab, Informatik 5, Erlangen, Germany

  • Venue:
  • TSD'11 Proceedings of the 14th international conference on Text, speech and dialogue
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

For voice rehabilitation, speech intelligibility is an important criterion. Automatic evaluation of intelligibility has been shown to be successful for automatic speech recognition methods combined with prosodic analysis. In this paper, this method is extended by using measures based on the Cepstral Peak Prominence (CPP). 73 hoarse patients (48.3±16.8 years) uttered the vowel /e/ and read the German version of the text "The North Wind and the Sun". Their intelligibility was evaluated perceptually by 5 speech therapists and physicians according to a 5-point scale. Support Vector Regression (SVR) revealed a feature set with a human-machine correlation of r = 0.85 consisting of the word accuracy, smoothed CPP computed from a speech section, and three prosodic features (normalized energy of word-pause-word intervals, F0 value at voice offset in a word, and standard deviation of jitter). The average human-human correlation was r = 0.82. Hence, the automatic method can be a meaningful objective support for perceptual analysis.