Modelling speaker intelligibility in noise

  • Authors:
  • Jon Barker;Martin Cooke

  • Affiliations:
  • University of Sheffield, Department of Computer Science, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, United Kingdom;University of Sheffield, Department of Computer Science, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, United Kingdom

  • Venue:
  • Speech Communication
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This study compared listeners' performance on a multispeaker speech-in-noise task with that of a model inspired by automatic speech recognition techniques. Listeners identified three keywords in simple 6-word sentences presented in speech-shaped noise at a range of signal-to-noise ratios. Sentence material was provided by 18 male or 16 female speakers. An across-speaker analysis of a number of acoustic parameters (vocal tract length, mean fundamental frequency and speaking rate) found none to be consistently good predictors of relative intelligibility. A simple measure of degree of energetic masking was a good predictor of female speech intelligibility, especially in high noise conditions, but failed to account for interspeaker differences for the male group. A glimpsing model, which combined a simulation of energetic masking with speaker-dependent statistical models, produced recognition scores which were fitted to the behavioural data pooled across all speakers. Using a single set of speaker-independent, noise-level-independent parameters, the model was able to predict not only the intelligibility of individual speakers to a remarkable degree, but could also account for most of the token-wise intelligibilities of the letter keywords. The fit was particularly good in high noise conditions.