Acoustic variability and automatic recognition of children's speech
Speech Communication
Access to recorded interviews: A research agenda
Journal on Computing and Cultural Heritage (JOCCH)
Towards age-independent acoustic modeling
Speech Communication
Advances in children's speech recognition within an interactive literacy tutor
HLT-NAACL-Short '04 Proceedings of HLT-NAACL 2004: Short Papers
Improved automatic speech recognition through speaker normalization
Computer Speech and Language
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
This paper presents improved methods for vocal tract normalization (VTN) along with experimental tests on three databases. We propose a new method for VTN in training: by using acoustic models with single Gaussian densities per state for selecting the normalization scales the need for the models to learn the normalization scales of the training speakers is avoided. We show that using single Gaussian densities for selecting the normalization scales in training results in lower error rates than using mixture densities. For VTN in recognition, we propose an improvement of the well-known multiple-pass strategy: by using an unnormalized acoustic model for the first recognition pass instead of a normalized model lower error rates are obtained. In recognition tests, this method is compared with a fast variant of VTN. The multiple-pass strategy is an efficient method but it is suboptimal because the normalization scale and the word sequence are determined sequentially. We found that for telephone digit string recognition this suboptimality reduces the VTN gain in recognition performance by 30% relative. On the German spontaneous scheduling task Verbmobil, the WSJ task and the German telephone digit string corpus SieTill the proposed methods for VTN reduce the error rates significantly.