Robust Feature Extraction for Continuous Speech Recognition Using the MVDR Spectrum Estimation Method

Authors:
Satya Dharanipragada;Umit H. Yapanel;Bhaskar D. Rao
Affiliations:
Citadel Investment Group, Chicago, IL;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2007

Citing 0
Cited 9

Stabilised weighted linear prediction

Speech Communication
Hybrid wavelet based LPC features for Hindi speech recognition

International Journal of Information and Communication Technology
Signal adaptive spectral envelope estimation for robust speech recognition

Speech Communication
Robust MVDR-based feature extraction for speech recognition

ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
Feature selection using singular value decomposition and QR factorization with column pivoting for text-independent speaker identification

Speech Communication
Robustness evaluation of wavelet based features for continuous speech recognition

International Journal of Intelligent Systems Technologies and Applications
Detection of landmines and underground utilities from acoustic and GPR images with a cepstral approach

Journal of Visual Communication and Image Representation
Fingerprint recognition using mel-frequency cepstral coefficients

Pattern Recognition and Image Analysis
Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a robust feature extraction technique for continuous speech recognition. Central to the technique is the minimum variance distortionless response (MVDR) method of spectrum estimation. We consider incorporating perceptual information in two ways: 1) after the MVDR power spectrum is computed and 2) directly during the MVDR spectrum estimation. We show that incorporating perceptual information directly into the spectrum estimation improves both robustness and computational efficiency significantly. We analyze the class separability and speaker variability properties of the features using a Fisher linear discriminant measure and show that these features provide better class separability and better suppression of speaker-dependent information than the widely used mel frequency cepstral coefficient (MFCC) features. We evaluate the technique on four different tasks: an in-car speech recognition task, the Aurora-2 matched task, the Wall Street Journal (WSJ) task, and the Switchboard task. The new feature extraction technique gives lower word-error-rates than the MFCC and perceptual linear prediction (PLP) feature extraction techniques in most cases. Statistical significance tests reveal that the improvement is most significant in high noise conditions. The technique thus provides improved robustness to noise without sacrificing performance in clean conditions