A shift-based approach to speaker normalization using non-linear frequency-scaling model

  • Authors:
  • Rohit Sinha;S. Umesh

  • Affiliations:
  • Department of Electrical Engineering, Indian Institute of Technology, Kanpur 208 016, India;Department of Electrical Engineering, Indian Institute of Technology, Kanpur 208 016, India

  • Venue:
  • Speech Communication
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

In this work, we present a speaker-normalization method based on the idea that the speaker-dependent scale-factor can be separated out as a fixed translation factor in an alternate domain. We also introduce a non-linear frequency-scaling model motivated by the analysis of speech data. The proposed shift-based normalization approach is implemented using a maximum-likelihood (ML) search for the translation factor in the alternate domain. The advantage of our approach is that we are able to show the relationship between conventional frequency-warping based vocal-tract length normalization (VTLN) methods and the methods based on shifts in psycho-acoustic scale thus providing a unifying frame-work for speaker-normalization. Additionally, in our approach it is simple to show that the shifting required for normalization can be expressed as a linear transformation in the cepstral domain. This is important for computational efficiency since we do not have to recompute the features by re-doing the signal processing for each scale/translation factor as is usually done in conventional normalization. We present recognition results using our proposed approach on a digit recognition task and show that the non-linear scaling model provides relative improvement of 4% for adults and 7.5% for children when compared to the linear-scaling model.