Speaker normalization on conversational telephone speech

Authors:
S. Wegmann;D. McAllaster;J. Orloff;B. Peskin
Affiliations:
Dragon Syst. Inc., Newton, MA, USA;Dragon Syst. Inc., Newton, MA, USA;Dragon Syst. Inc., Newton, MA, USA;Dragon Syst. Inc., Newton, MA, USA
Venue:
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Year:
1996

Citing 0
Cited 13

Large-Vocabulary Speech Recognition Algorithms

Computer
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-Based Interactive Toy

AMT '01 Proceedings of the 6th International Computer Science Conference on Active Media Technology
Classifier combination schemes in speech impediment therapy systems

Acta Cybernetica
Acoustic variability and automatic recognition of children's speech

Speech Communication
A shift-based approach to speaker normalization using non-linear frequency-scaling model

Speech Communication
Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC

Computer Speech and Language
Towards age-independent acoustic modeling

Speech Communication
A new method for mispronunciation detection using Support Vector Machine based on Pronunciation Space Models

Speech Communication
Improved automatic speech recognition through speaker normalization

Computer Speech and Language
Speaker normalization via springy discriminant analysis and pitch estimation

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Advances in mandarin broadcast speech transcription at IBM under the DARPA GALE program

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
The IBM rich transcription spring 2006 speech-to-text system for lecture meetings

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech

Pattern Recognition Letters

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper reports on a simplified system for determining vocal tract normalization. Such normalization has led to significant gains in recognition accuracy by reducing variability among speakers and allowing the pooling of training data and the construction of sharper models. But standard methods for determining the warp scale have been extremely cumbersome, generally requiring multiple recognition passes. We present a new system for warp scale selection which uses a simple generic voiced speech model to rapidly select appropriate frequency scales. The selection is sufficiently streamlined that it can moved completely into the front-end processing. Using this system on a standard test of the Switchboard Corpus, we have achieved relative reductions in word error rates of 12% over unnormalized gender-independent models and 6% over our best unnormalized gender-dependent models.