A study of low-variance multi-taper features for distributed speech recognition

Authors:
Md Jahangir Alam;Patrick Kenny;Douglas O'Shaughnessy
Affiliations:
CRIM, Montreal, Canada and INRS-EMT, University of Quebec, Montreal, Canada;CRIM, Montreal, Canada;INRS-EMT, University of Quebec, Montreal, Canada
Venue:
NOLISP'11 Proceedings of the 5th international conference on Advances in nonlinear speech processing
Year:
2011

Citing 3
Cited 1

Optimal cepstrum estimation using multiple windows

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Minimum bias multiple taper spectral estimation

IEEE Transactions on Signal Processing
A multiple window method for estimation of peaked spectra

IEEE Transactions on Signal Processing

Multitaper MFCC and PLP features for speaker verification using i-vectors

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we study low-variance multi-taper spectrum estimation methods to compute the mel-frequency cepstral coefficient (MFCC) features for robust speech recognition. In speech recognition, MFCC features are usually computed from a Hamming-windowed DFT spectrum. Although windowing helps in reducing the bias of the spectrum, but variance remains high. Multitaper spectrum estimation methods can be used to correct the shortcomings of single taper (or window) spectrum estimation methods. Experimental results on the AURORA-2 corpus show that the multi-taper methods, specifically the multi-peak multi-taper method, perform better compared to the Hamming-windowed spectrum estimation method.