Importance sampling to compute likelihoods of noise-corrupted speech

Authors:
R. C. Van Dalen;M. J. F. Gales
Affiliations:
Engineering Department, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom;Engineering Department, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom
Venue:
Computer Speech and Language
Year:
2013

Citing 11
Cited 0

Acoustical and environmental robustness in automatic speech recognition

Acoustical and environmental robustness in automatic speech recognition
Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

Speech Communication - Special issue on speech processing in adverse conditions
Speech recognition in noisy environments using first-order vector Taylor series

Speech Communication
Jacobian Approach to Fast Acoustic Model Adaptation

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Speech recognition in noisy environments

Speech recognition in noisy environments
A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions

Computer Speech and Language
Joint uncertainty decoding with the second order approximation for noise robust speech recognition

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Bayesian feature enhancement using a mixture of unscented transformation for uncertainty decoding of noisy speech

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Discriminative classifiers with adaptive kernels for noise robust speech recognition

Computer Speech and Language
Extended VTS for Noise-Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

One way of making speech recognisers more robust to noise is model compensation. Rather than enhancing the incoming observations, model compensation techniques modify a recogniser's state-conditional distributions so they model the speech in the target environment. Because the interaction between speech and noise is non-linear, even for Gaussian speech and noise the corrupted speech distribution has no closed form. Thus, model compensation methods approximate it with a parametric distribution, such as a Gaussian or a mixture of Gaussians. The impact of this approximation has never been quantified. This paper therefore introduces a non-parametric method to compute the likelihood of a corrupted speech observation. It uses sampling and, given speech and noise distributions and a mismatch function, is exact in the limit. It therefore gives a theoretical bound for model compensation. Though computing the likelihood is computationally expensive, the novel method enables a performance comparison based on the criterion that model compensation methods aim to minimise: the KL divergence to the ideal compensation. It gives the point where the Kullback-Leibler (KL) divergence is zero. This paper examines the performance of various compensation methods, such as vector Taylor series (VTS) and data-driven parallel model combination (DPMC). It shows that more accurate modelling than Gaussian-for-Gaussian compensation improves the performance of speech recognition.