Fast communication: Improved modulation spectrum enhancement methods for robust speech recognition

Authors:
Jeih-Weih Hung;Wen-Hsiang Tu;Chien-Chou Lai
Affiliations:
Department of Electrical Engineering, National Chi Nan University, Nantou 545, Taiwan;Department of Electrical Engineering, National Chi Nan University, Nantou 545, Taiwan;Department of Electrical Engineering, National Chi Nan University, Nantou 545, Taiwan
Venue:
Signal Processing
Year:
2012

Citing 6
Cited 0

Cepstral domain segmental feature vector normalization for noise robust speech recognition

Speech Communication - Special issue on robust speech recognition
Fast communication: Extension of the local subspace method to enhancement of speech with colored noise

Signal Processing
Distributed multichannel speech enhancement with minimum mean-square error short-time spectral amplitude, log-spectral amplitude, and spectral phase estimation

Signal Processing
MVA Processing of Speech Features

IEEE Transactions on Audio, Speech, and Language Processing
Quantile based histogram equalization for noise robust large vocabulary speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Normalization of the Speech Modulation Spectra for Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.08

Visualization

Abstract

In this paper, we present two novel algorithms to improve the noise robustness of features in speech recognition: modulation spectrum replacement (MSR) and modulation spectrum filtering (MSF). The magnitude spectra of feature streams are updated by referring to the information collected in the clean training set, and the resulting new feature streams are more noise-robust to achieve higher recognition accuracy. In experiments conducted on the Aurora-2 noisy digit database, we show that the proposed MSR achieves an average relative error reduction rate of nearly 57% compared to baseline processing, and MSF is specifically effective in enhancing the features preprocessed by conventional feature normalization methods to achieve even better recognition accuracy in noise-corrupted situations.