Improved methods for vocal tract normalization

Authors:
L. Welling;S. Kanthak;H. Ney
Affiliations:
Tech. Hochschule Aachen, Germany;-;-
Venue:
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Year:
1999

Citing 0
Cited 8

Acoustic variability and automatic recognition of children's speech

Speech Communication
Highly accurate children's speech recognition for interactive reading tutors using subword units

Speech Communication
Access to recorded interviews: A research agenda

Journal on Computing and Cultural Heritage (JOCCH)
Towards age-independent acoustic modeling

Speech Communication
Advances in children's speech recognition within an interactive literacy tutor

HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Improved automatic speech recognition through speaker normalization

Computer Speech and Language
Comparison of methods for language-dependent and language-independent query-by-example spoken term detection

ACM Transactions on Information Systems (TOIS)
Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents improved methods for vocal tract normalization (VTN) along with experimental tests on three databases. We propose a new method for VTN in training: by using acoustic models with single Gaussian densities per state for selecting the normalization scales the need for the models to learn the normalization scales of the training speakers is avoided. We show that using single Gaussian densities for selecting the normalization scales in training results in lower error rates than using mixture densities. For VTN in recognition, we propose an improvement of the well-known multiple-pass strategy: by using an unnormalized acoustic model for the first recognition pass instead of a normalized model lower error rates are obtained. In recognition tests, this method is compared with a fast variant of VTN. The multiple-pass strategy is an efficient method but it is suboptimal because the normalization scale and the word sequence are determined sequentially. We found that for telephone digit string recognition this suboptimality reduces the VTN gain in recognition performance by 30% relative. On the German spontaneous scheduling task Verbmobil, the WSJ task and the German telephone digit string corpus SieTill the proposed methods for VTN reduce the error rates significantly.