A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition

Authors:
Umit H. Yapanel;John H. L. Hansen
Affiliations:
The Center for Robust Speech Systems, University of Texas at Dallas, Department of Electrical Engineering, EC33, P.O. Box 830688, Richardson, TX 75083-0688, USA;The Center for Robust Speech Systems, University of Texas at Dallas, Department of Electrical Engineering, EC33, P.O. Box 830688, Richardson, TX 75083-0688, USA
Venue:
Speech Communication
Year:
2008

Citing 7
Cited 3

Adaptive filter theory (2nd ed.)

Adaptive filter theory (2nd ed.)
Language accent classification in American English

Speech Communication
Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition

Speech Communication - Special issue on speech under stress
Discrete-time signal processing (2nd ed.)

Discrete-time signal processing (2nd ed.)
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Acoustic modeling and speaker normalization strategies with application to robust in-vehicle speech recognition and dialect classification

Acoustic modeling and speaker normalization strategies with application to robust in-vehicle speech recognition and dialect classification
Discrete all-pole modeling

IEEE Transactions on Signal Processing

Towards an intelligent acoustic front end for automatic speech recognition: built-in speaker normalization

EURASIP Journal on Audio, Speech, and Music Processing - Intelligent Audio, Speech, and Music Processing Applications
Unsupervised equalization of Lombard effect for speech recognition in noisy adverse environments

IEEE Transactions on Audio, Speech, and Language Processing
Maximum Likelihood Acoustic Factor Analysis Models for Robust Speaker Verification in Noise

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Acoustic feature extraction from speech constitutes a fundamental component of automatic speech recognition (ASR) systems. In this paper, we propose a novel feature extraction algorithm, perceptual-MVDR (PMVDR), which computes cepstral coefficients from the speech signal. This new feature representation is shown to better model the speech spectrum compared to traditional feature extraction approaches. Experimental results for small (40-word digits) to medium (5k-word dictation) size vocabulary tasks show varying degree of consistent improvements across different experiments; however, the new front-end is most effective in noisy car environments. The PMVDR front-end uses the minimum variance distortionless response (MVDR) spectral estimator to represent the upper envelope of the speech signal. Unlike Mel frequency cepstral coefficients (MFCCs), the proposed front-end does not utilize a filterbank. The effectiveness of the PMVDR approach is demonstrated by comparing speech recognition accuracies with the traditional MFCC front-end and recently proposed PMCC front-end in both noise-free and real adverse environments. For speech recognition in noisy car environments, a 40-word vocabulary task, PMVDR front-end provides a 36% relative decrease in word error rate (WER) over the MFCC front-end. Under simulated speaker stress conditions, a 35-word vocabulary task, the PMVDR front-end yields a 27% relative decrease in the WER. For a noise-free dictation task, a 5k-word vocabulary task, again a relative 8% reduction in the WER is reported. Finally, a novel analysis technique is proposed to quantify noise robustness of an acoustic front-end. This analysis is conducted for the acoustic front-ends analyzed in the paper and results are presented.