Comparison of performance with voiced and whispered speech in word recognition and mean-formant-frequency discrimination

Authors:
Toshio Irino;Yoshie Aoki;Hideki Kawahara;Roy D. Patterson
Affiliations:
Faculty of Systems Engineering, Wakayama University, 930 Sakaedani, Wakayama 640-8510, Japan;Faculty of Systems Engineering, Wakayama University, 930 Sakaedani, Wakayama 640-8510, Japan;Faculty of Systems Engineering, Wakayama University, 930 Sakaedani, Wakayama 640-8510, Japan;Centre for the Neural Basis of Hearing, Department of Physiology, Development and Neuroscience, University of Cambridge, Downing Street, Cambridge CB2 3EG, United Kingdom
Venue:
Speech Communication
Year:
2012

Citing 4
Cited 1

Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds

Speech Communication
Linear Prediction of Speech

Linear Prediction of Speech
Segregating information about the size and shape of the vocal tract using a time-domain auditory model: the stabilised wavelet-Mellin transform

Speech Communication
A Dynamic Compressive Gammachirp Auditory Filterbank

IEEE Transactions on Audio, Speech, and Language Processing

Unsupervised learning of phonemes of whispered speech in a noisy environment based on convolutive non-negative matrix factorization

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has recently been a series of studies concerning the interaction of glottal pulse rate (GPR) and mean-formant-frequency (MFF) in the perception of speaker characteristics and speech recognition. This paper extends the research by comparing the recognition and discrimination performance achieved with voiced words to that achieved with whispered words. The recognition experiment shows that performance with whispered words is slightly worse than with voiced words at all MFFs when the GPR of the voiced words is in the middle of the normal range. But, as GPR decreases below this range, voiced-word performance decreases and eventually becomes worse than whispered-word performance. The discrimination experiment shows that the just noticeable difference (JND) for MFF is essentially independent of the mode of vocal excitation; the JND is close to 5% for both voiced and voiceless words for all speaker types. The interaction between GPR and VTL is interpreted in terms of the stability of the internal representation of speech which improves with GPR across the range of values used in these experiments.