Acoustical and environmental robustness in automatic speech recognition
Acoustical and environmental robustness in automatic speech recognition
Speech Communication - Special issue on speech processing in adverse conditions
Speech recognition in noisy environments using first-order vector Taylor series
Speech Communication
Jacobian Approach to Fast Acoustic Model Adaptation
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)-Volume 2 - Volume 2
Speech recognition in noisy environments
Speech recognition in noisy environments
Computer Speech and Language
Joint uncertainty decoding with the second order approximation for noise robust speech recognition
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Discriminative classifiers with adaptive kernels for noise robust speech recognition
Computer Speech and Language
Extended VTS for Noise-Robust Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing
Robust Speech Recognition Using a Cepstral Minimum-Mean-Square-Error-Motivated Noise Suppressor
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
One way of making speech recognisers more robust to noise is model compensation. Rather than enhancing the incoming observations, model compensation techniques modify a recogniser's state-conditional distributions so they model the speech in the target environment. Because the interaction between speech and noise is non-linear, even for Gaussian speech and noise the corrupted speech distribution has no closed form. Thus, model compensation methods approximate it with a parametric distribution, such as a Gaussian or a mixture of Gaussians. The impact of this approximation has never been quantified. This paper therefore introduces a non-parametric method to compute the likelihood of a corrupted speech observation. It uses sampling and, given speech and noise distributions and a mismatch function, is exact in the limit. It therefore gives a theoretical bound for model compensation. Though computing the likelihood is computationally expensive, the novel method enables a performance comparison based on the criterion that model compensation methods aim to minimise: the KL divergence to the ideal compensation. It gives the point where the Kullback-Leibler (KL) divergence is zero. This paper examines the performance of various compensation methods, such as vector Taylor series (VTS) and data-driven parallel model combination (DPMC). It shows that more accurate modelling than Gaussian-for-Gaussian compensation improves the performance of speech recognition.