A study of low-variance multi-taper features for distributed speech recognition

  • Authors:
  • Md Jahangir Alam;Patrick Kenny;Douglas O'Shaughnessy

  • Affiliations:
  • CRIM, Montreal, Canada and INRS-EMT, University of Quebec, Montreal, Canada;CRIM, Montreal, Canada;INRS-EMT, University of Quebec, Montreal, Canada

  • Venue:
  • NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we study low-variance multi-taper spectrum estimation methods to compute the mel-frequency cepstral coefficient (MFCC) features for robust speech recognition. In speech recognition, MFCC features are usually computed from a Hamming-windowed DFT spectrum. Although windowing helps in reducing the bias of the spectrum, but variance remains high. Multitaper spectrum estimation methods can be used to correct the shortcomings of single taper (or window) spectrum estimation methods. Experimental results on the AURORA-2 corpus show that the multi-taper methods, specifically the multi-peak multi-taper method, perform better compared to the Hamming-windowed spectrum estimation method.