Variational noise model composition through model perturbation for robust speech recognition with time-varying background noise

Authors:
Wooil Kim;John H. L. Hansen
Affiliations:
Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science, Department of Electrical Engineering, University of Texas at Dallas, 2601 N. Floyd Road, EC33, Ric ...;Center for Robust Speech Systems (CRSS), Erik Jonsson School of Engineering and Computer Science, Department of Electrical Engineering, University of Texas at Dallas, 2601 N. Floyd Road, EC33, Ric ...
Venue:
Speech Communication
Year:
2011

Citing 6
Cited 1

Data-driven environmental compensation for speech recognition: a unified approach

Speech Communication
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Speech recognition in noisy environments

Speech recognition in noisy environments
Feature compensation in the cepstral domain employing model combination

Speech Communication
Time-frequency correlation-based missing-feature reconstruction for robust speech recognition in band-restricted conditions

IEEE Transactions on Audio, Speech, and Language Processing
Constrained iterative speech enhancement with application to speechrecognition

IEEE Transactions on Signal Processing

Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This study proposes a novel model composition method to improve speech recognition performance in time-varying background noise conditions. It is suggested that each element of the cepstral coefficients represents the frequency degree of the changing components in the envelope of the log-spectrum. With this motivation, in the proposed method, variational noise models are formulated by selectively applying perturbation factors to the mean parameters of a basis model, resulting in a collection of noise models that more accurately reflect the natural range of spectral patterns seen in the log-spectral domain. The basis noise model is obtained from the silence segments of the input speech. The perturbation factors are designed separately for changes in the energy level and spectral envelope. The proposed variational model composition (VMC) method is employed to generate multiple environmental models for our previously proposed parallel combined gaussian mixture model (PCGMM) based feature compensation algorithm. The mixture sharing technique is integrated to reduce computational expenses, caused by employing the variational models. Experimental results prove that the proposed method is considerably more effective at increasing speech recognition performance in time-varying background noise conditions, with +31.31%, +10.65%, and +20.54% average relative improvements in word error rate for speech babble, background music, and real-life in-vehicle noise conditions respectively, compared to the original basic PCGMM method.