Speaker Normalization Based on Frequency Warping
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Speaker normalization on conversational telephone speech
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A parametric approach to vocal tract length normalization
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
A Study of Filter Bank Smoothing in MFCC Features for Recognition of Children's Speech
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.01 |
In this work, we present a speaker-normalization method based on the idea that the speaker-dependent scale-factor can be separated out as a fixed translation factor in an alternate domain. We also introduce a non-linear frequency-scaling model motivated by the analysis of speech data. The proposed shift-based normalization approach is implemented using a maximum-likelihood (ML) search for the translation factor in the alternate domain. The advantage of our approach is that we are able to show the relationship between conventional frequency-warping based vocal-tract length normalization (VTLN) methods and the methods based on shifts in psycho-acoustic scale thus providing a unifying frame-work for speaker-normalization. Additionally, in our approach it is simple to show that the shifting required for normalization can be expressed as a linear transformation in the cepstral domain. This is important for computational efficiency since we do not have to recompute the features by re-doing the signal processing for each scale/translation factor as is usually done in conventional normalization. We present recognition results using our proposed approach on a digit recognition task and show that the non-linear scaling model provides relative improvement of 4% for adults and 7.5% for children when compared to the linear-scaling model.