Feature compensation in the cepstral domain employing model combination

Authors:
Wooil Kim;John H. L. Hansen
Affiliations:
Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, TX, USA;Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, TX, USA
Venue:
Speech Communication
Year:
2009

Citing 9
Cited 6

Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition

Speech Communication - Special issue on speech under stress
Data-driven environmental compensation for speech recognition: a unified approach

Speech Communication
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System
Acoustical and Environmental Robustness in Automatic Speech Recognition

Acoustical and Environmental Robustness in Automatic Speech Recognition
Speech recognition in noisy environments

Speech recognition in noisy environments
Improved speech recognition via speaker stress directed classification

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01
Constrained iterative speech enhancement with application to speechrecognition

IEEE Transactions on Signal Processing
A study on speaker adaptation of the parameters of continuousdensity hidden Markov models

IEEE Transactions on Signal Processing
Environmental Sniffing: Noise Knowledge Estimation for Robust Speech Systems

IEEE Transactions on Audio, Speech, and Language Processing

Multi-environment model adaptation based on vector Taylor series for robust speech recognition

Pattern Recognition
Time-frequency correlation based missing-feature reconstruction for robust speech recognition in background noise conditions

Asilomar'09 Proceedings of the 43rd Asilomar conference on Signals, systems and computers
The integration of principal component analysis and cepstral mean subtraction in parallel model combination for robust speech recognition

Digital Signal Processing
Missing-feature reconstruction by leveraging temporal spectral correlation for robust speech recognition in background noise conditions

IEEE Transactions on Audio, Speech, and Language Processing
Variational noise model composition through model perturbation for robust speech recognition with time-varying background noise

Speech Communication
Myocardial infarction classification with multi-lead ECG using hidden Markov models and Gaussian mixture models

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present an effective cepstral feature compensation scheme which leverages knowledge of the speech model in order to achieve robust speech recognition. In the proposed scheme, the requirement for a prior noisy speech database in off-line training is eliminated by employing parallel model combination for the noise-corrupted speech model. Gaussian mixture models of clean speech and noise are used for the model combination. The adaptation of the noisy speech model is possible only by updating the noise model. This method has the advantage of reduced computational expenses and improved accuracy for model estimation since it is applied in the cepstral domain. In order to cope with time-varying background noise, a novel interpolation method of multiple models is employed. By sequentially calculating the posterior probability of each environmental model, the compensation procedure can be applied on a frame-by-frame basis. In order to reduce the computational expense due to the multiple-model method, a technique of sharing similar Gaussian components is proposed. Acoustically similar components across an inventory of environmental models are selected by the proposed sub-optimal algorithm which employs the Kullback-Leibler similarity distance. The combined hybrid model, which consists of the selected Gaussian components is used for noisy speech model sharing. The performance is examined using Aurora2 and speech data for an in-vehicle environment. The proposed feature compensation algorithm is compared with standard methods in the field (e.g., CMN, spectral subtraction, RATZ). The experimental results demonstrate that the proposed feature compensation schemes are very effective in realizing robust speech recognition in adverse noisy environments. The proposed model combination-based feature compensation method is superior to existing model-based feature compensation methods. Of particular interest is that the proposed method shows up to an 11.59% relative WER reduction compared to the ETSI AFE front-end method. The multi-model approach is effective at coping with changing noise conditions for input speech, producing comparable performance to the matched model condition. Applying the mixture sharing method brings a significant reduction in computational overhead, while maintaining recognition performance at a reasonable level with near real-time operation.