AMT '01 Proceedings of the 6th International Computer Science Conference on Active Media Technology
Classifier combination schemes in speech impediment therapy systems
Acta Cybernetica
Acoustic variability and automatic recognition of children's speech
Speech Communication
Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC
Computer Speech and Language
Towards age-independent acoustic modeling
Speech Communication
Improved automatic speech recognition through speaker normalization
Computer Speech and Language
Speaker normalization via springy discriminant analysis and pitch estimation
TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Advances in mandarin broadcast speech transcription at IBM under the DARPA GALE program
ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
The IBM rich transcription spring 2006 speech-to-text system for lecture meetings
MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction
Hi-index | 0.01 |
This paper reports on a simplified system for determining vocal tract normalization. Such normalization has led to significant gains in recognition accuracy by reducing variability among speakers and allowing the pooling of training data and the construction of sharper models. But standard methods for determining the warp scale have been extremely cumbersome, generally requiring multiple recognition passes. We present a new system for warp scale selection which uses a simple generic voiced speech model to rapidly select appropriate frequency scales. The selection is sufficiently streamlined that it can moved completely into the front-end processing. Using this system on a standard test of the Switchboard Corpus, we have achieved relative reductions in word error rates of 12% over unnormalized gender-independent models and 6% over our best unnormalized gender-dependent models.