Robust speech recognition using spatial-temporal feature distribution characteristics

Authors:
Berlin Chen;Wei-Hau Chen;Shih-Hsiang Lin;Wen-Yi Chu
Affiliations:
Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei, Taiwan;Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei, Taiwan;Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei, Taiwan;Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei, Taiwan
Venue:
Pattern Recognition Letters
Year:
2011

Citing 13
Cited 1

Speech recognition in noisy environments: a survey

Speech Communication
Cepstral domain segmental feature vector normalization for noise robust speech recognition

Speech Communication - Special issue on robust speech recognition
Spoken Language Processing: A Guide to Theory, Algorithm, and System Development

Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
Speech recognition in noisy environments

Speech recognition in noisy environments
Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition

Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition
Image Processing - Principles and Applications

Image Processing - Principles and Applications
Higher order cepstral moment normalization for improved robust speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
A Discriminative and Heteroscedastic Linear Feature Transformation for Multiclass Classification

ICPR '10 Proceedings of the 2010 20th International Conference on Pattern Recognition
Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Cepstral Vector Normalization Based on Stereo Data for Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
MVA Processing of Speech Features

IEEE Transactions on Audio, Speech, and Language Processing
Quantile based histogram equalization for noise robust large vocabulary speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
An Environment-Compensated Minimum Classification Error Training Approach Based on Stochastic Vector Mapping

IEEE Transactions on Audio, Speech, and Language Processing

Probabilistic modulation spectrum factorization for robust speech recognition

ROCLING '11 ROCLING 2011 Poster Papers

Quantified Score

Hi-index	0.10

Visualization

Abstract

Histogram equalization (HEQ) is one of the most efficient and effective techniques that have been used to reduce the mismatch between training and test acoustic conditions. However, most of the current HEQ methods are merely performed in a dimension-wise manner and without allowing for the contextual relationships between consecutive speech frames. In this paper, we present several novel HEQ approaches that exploit spatial-temporal feature distribution characteristics for speech feature normalization. The automatic speech recognition (ASR) experiments were carried out on the Aurora-2 standard noise-robust ASR task. The performance of the presented approaches was thoroughly tested and verified by comparisons with the other popular HEQ methods. The experimental results show that for clean-condition training, our approaches yield a significant word error rate reduction over the baseline system, and also give competitive performance relative to the other HEQ methods compared in this paper.