Speaker normalization on conversational telephone speech

  • Authors:
  • S. Wegmann;D. McAllaster;J. Orloff;B. Peskin

  • Affiliations:
  • Dragon Syst. Inc., Newton, MA, USA;Dragon Syst. Inc., Newton, MA, USA;Dragon Syst. Inc., Newton, MA, USA;Dragon Syst. Inc., Newton, MA, USA

  • Venue:
  • ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
  • Year:
  • 1996

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper reports on a simplified system for determining vocal tract normalization. Such normalization has led to significant gains in recognition accuracy by reducing variability among speakers and allowing the pooling of training data and the construction of sharper models. But standard methods for determining the warp scale have been extremely cumbersome, generally requiring multiple recognition passes. We present a new system for warp scale selection which uses a simple generic voiced speech model to rapidly select appropriate frequency scales. The selection is sufficiently streamlined that it can moved completely into the front-end processing. Using this system on a standard test of the Switchboard Corpus, we have achieved relative reductions in word error rates of 12% over unnormalized gender-independent models and 6% over our best unnormalized gender-dependent models.