Higher order cepstral moment normalization for improved robust speech recognition

Authors:
Chang-Wen Hsu;Lin-Shan Lee
Affiliations:
Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan;Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 7
Cited 2

Cepstral parameter compensation for HMM recognition in noise

Speech Communication - Special issue on speech processing in adverse conditions
Data-driven environmental compensation for speech recognition: a unified approach

Speech Communication
On stochastic feature and model compensation approaches to robust speech recognition

Speech Communication - Special issue on robust speech recognition
Cepstral domain segmental feature vector normalization for noise robust speech recognition

Speech Communication - Special issue on robust speech recognition
Acoustical and Environmental Robustness in Automatic Speech Recognition

Acoustical and Environmental Robustness in Automatic Speech Recognition
An hypothesized Wiener filtering approach to noisy speech recognition

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
Quantile based histogram equalization for noise robust large vocabulary speech recognition

IEEE Transactions on Audio, Speech, and Language Processing

Robust speech recognition using spatial-temporal feature distribution characteristics

Pattern Recognition Letters
Environmental robust speech and speaker recognition through multi-channel histogram equalization

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cepstral normalization has widely been used as a powerful approach to produce robust features for speech recognition. Good examples of this approach include cepstral mean subtraction, and cepstral mean and variance normalization, in which either the first or both the first and the second moments of the Mel-frequency cepstral coefficients (MFCCs) are normalized. In this paper, we propose the family of higher order cepstral moment normalization, in which the MFCC parameters are normalized with respect to a few moments of orders higher than 1 or 2. The basic idea is that the higher order moments are more dominated by samples with larger values, which are very likely the primary sources of the asymmetry and abnormal flatness or tail size of the parameter distributions. Normalization with respect to these moments therefore puts more emphasis on these signal components and constrains the distributions to be more symmetric with more reasonable flatness and tail size. Tbe fundamental principles behind this approach are also analyzed and discussed based on the statistical properties of the distributions of the MFCC parameters. Experimental results based on the AURORA 2, AURORA 3, AURORA 4, and Resource Management (RM) testing environments show that with the proposed approach, recognition accuracy can be significantly and consistently improved for all types of noise and all SNR conditions.