Modelling speaker intelligibility in noise

Authors:
Jon Barker;Martin Cooke
Affiliations:
University of Sheffield, Department of Computer Science, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, United Kingdom;University of Sheffield, Department of Computer Science, Regent Court, 211 Portobello Street, Sheffield, S1 4DP, United Kingdom
Venue:
Speech Communication
Year:
2007

Citing 5
Cited 6

A note on the acoustic-phonetic characteristics of inadvertently clear speech

Speech Communication
Intelligibility of normal speech I: global and fine-grained acoustic-phonetic talker characteristics

Speech Communication - Special issue on acoustic echo control and speech enhancement techniques
Speech recognition by machines and humans

Speech Communication
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Recognizing speech of goats, wolves, sheep and...non-natives

Speech Communication

Editorial note: Bridging the gap between human and automatic speech recognition

Speech Communication
The contribution of changes in F0 and spectral tilt to increased intelligibility of speech produced in noise

Speech Communication
Effects of energetic and informational masking on speech segmentation by native and non-native speakers

Speech Communication
Language-independent processing in speech perception: Identification of English intervocalic consonants by speakers of eight European languages

Speech Communication
A Romanian corpus for speech perception and automatic speech recognition

NEHIPISIC'11 Proceeding of 10th WSEAS international conference on electronics, hardware, wireless and optical communications, and 10th WSEAS international conference on signal processing, robotics and automation, and 3rd WSEAS international conference on nanotechnology, and 2nd WSEAS international conference on Plasma-fusion-nuclear physics
The listening talker: A review of human and algorithmic context-induced modifications of speech

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study compared listeners' performance on a multispeaker speech-in-noise task with that of a model inspired by automatic speech recognition techniques. Listeners identified three keywords in simple 6-word sentences presented in speech-shaped noise at a range of signal-to-noise ratios. Sentence material was provided by 18 male or 16 female speakers. An across-speaker analysis of a number of acoustic parameters (vocal tract length, mean fundamental frequency and speaking rate) found none to be consistently good predictors of relative intelligibility. A simple measure of degree of energetic masking was a good predictor of female speech intelligibility, especially in high noise conditions, but failed to account for interspeaker differences for the male group. A glimpsing model, which combined a simulation of energetic masking with speaker-dependent statistical models, produced recognition scores which were fitted to the behavioural data pooled across all speakers. Using a single set of speaker-independent, noise-level-independent parameters, the model was able to predict not only the intelligibility of individual speakers to a remarkable degree, but could also account for most of the token-wise intelligibilities of the letter keywords. The fit was particularly good in high noise conditions.