Speaker normalization using efficient frequency warping procedures

Authors:
Li Lee;R. C. Rose
Affiliations:
AT&TBell Labs., Murray Hill, NJ, USA;AT&TBell Labs., Murray Hill, NJ, USA
Venue:
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Year:
1996

Citing 0
Cited 21

Multi-speaker articulatory trajectory formation based on speaker-independent articulatory HMMs

Speech Communication
Using multiple acoustic feature sets for speech recognition

Speech Communication
Automatic speech recognition and speech variability: A review

Speech Communication
Acoustic variability and automatic recognition of children's speech

Speech Communication
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
Limited-Vocabulary Estonian Continuous Speech Recognition System using Hidden Markov Models

Informatica
Advances in Acoustic Modeling for the Recognition of Czech

TSD '08 Proceedings of the 11th international conference on Text, Speech and Dialogue
Towards an intelligent acoustic front end for automatic speech recognition: built-in speaker normalization

EURASIP Journal on Audio, Speech, and Music Processing - Intelligent Audio, Speech, and Music Processing Applications
An on-line speaker adaptation method for HMM-based speech recognizers

Acta Cybernetica
Towards age-independent acoustic modeling

Speech Communication
A new method for mispronunciation detection using Support Vector Machine based on Pronunciation Space Models

Speech Communication
Improved automatic speech recognition through speaker normalization

Computer Speech and Language
An automatic retraining method for speaker independent hidden Markov models

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Error approximation and minimum phone error acoustic model estimation

IEEE Transactions on Audio, Speech, and Language Processing
Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments

IEEE Transactions on Audio, Speech, and Language Processing
Statistical transformation of language and pronunciation models for spontaneous speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Robust speech recognition based on dereverberation parameter optimization using acoustic model likelihood

IEEE Transactions on Audio, Speech, and Language Processing - Special issue on processing reverberant speech: methodologies and applications
Exploring the effect of differences in the acoustic correlates of adults' and children's speech in the context of automatic speech recognition

EURASIP Journal on Audio, Speech, and Music Processing - Special issue on atypical speech
Pitch mean based frequency warping

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech

Pattern Recognition Letters
Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated. A set of low complexity, maximum likelihood based frequency warping procedures have been applied to speaker normalization for a telephone based connected digit recognition task. This paper presents an efficient means for estimating a linear frequency warping factor and a simple mechanism for implementing frequency warping by modifying the filter-bank in mel-frequency cepstrum feature analysis. An experimental study comparing these techniques to other well-known techniques for reducing variability is described. The results showed that frequency warping was consistently able to reduce word error rate by 20% even for very short utterances.