A note on the acoustic-phonetic characteristics of inadvertently clear speech
Speech Communication
Intelligibility of normal speech I: global and fine-grained acoustic-phonetic talker characteristics
Speech Communication - Special issue on acoustic echo control and speech enhancement techniques
Speech recognition by machines and humans
Speech Communication
Robust automatic speech recognition with missing and unreliable acoustic data
Speech Communication
Recognizing speech of goats, wolves, sheep and...non-natives
Speech Communication
Editorial note: Bridging the gap between human and automatic speech recognition
Speech Communication
A Romanian corpus for speech perception and automatic speech recognition
NEHIPISIC'11 Proceeding of 10th WSEAS international conference on electronics, hardware, wireless and optical communications, and 10th WSEAS international conference on signal processing, robotics and automation, and 3rd WSEAS international conference on nanotechnology, and 2nd WSEAS international conference on Plasma-fusion-nuclear physics
The listening talker: A review of human and algorithmic context-induced modifications of speech
Computer Speech and Language
Hi-index | 0.00 |
This study compared listeners' performance on a multispeaker speech-in-noise task with that of a model inspired by automatic speech recognition techniques. Listeners identified three keywords in simple 6-word sentences presented in speech-shaped noise at a range of signal-to-noise ratios. Sentence material was provided by 18 male or 16 female speakers. An across-speaker analysis of a number of acoustic parameters (vocal tract length, mean fundamental frequency and speaking rate) found none to be consistently good predictors of relative intelligibility. A simple measure of degree of energetic masking was a good predictor of female speech intelligibility, especially in high noise conditions, but failed to account for interspeaker differences for the male group. A glimpsing model, which combined a simulation of energetic masking with speaker-dependent statistical models, produced recognition scores which were fitted to the behavioural data pooled across all speakers. Using a single set of speaker-independent, noise-level-independent parameters, the model was able to predict not only the intelligibility of individual speakers to a remarkable degree, but could also account for most of the token-wise intelligibilities of the letter keywords. The fit was particularly good in high noise conditions.