Human Speech Perception: Some Lessons from Automatic Speech Recognition

Authors:
Hynek Hermansky
Affiliations:
-
Venue:
TSD '01 Proceedings of the 4th International Conference on Text, Speech and Dialogue
Year:
2001

Citing 5
Cited 1

Should recognizers have ears?

Speech Communication - Special issue on robust speech recognition
Speaker verification in a time-feature space

Speaker verification in a time-feature space
Multistream approach to robust speech recognition

Multistream approach to robust speech recognition
Data-driven methods for extracting features from speech

Data-driven methods for extracting features from speech
Temporal patterns (TRAPs) in ASR of noisy speech

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 01

Speech feature extraction based on wavelet modulation scale for robust speech recognition

ICONIP'06 Proceedings of the 13th international conference on Neural Information Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show that data-guided techniques optimized for classification of speech sounds into context-independent phoneme classes yield auditory-like frequency resolution and enhanced sensitivity to modulation frequencies in the 1- 15 Hz range. Next we present a viable recognition paradigm in which temporal trajectories of critical band spectral energies in individual critical bands are used to yield estimates of likelihood of phoneme classes. The relative success of this technique leads to discussion about auditory basis of human speech communication process. Overall, we argue against spectral envelope based linguistic code in communication by speech.