Importance sampling to compute likelihoods of noise-corrupted speech

  • Authors:
  • R. C. Van Dalen;M. J. F. Gales

  • Affiliations:
  • Engineering Department, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom;Engineering Department, University of Cambridge, Trumpington Street, Cambridge CB2 1PZ, United Kingdom

  • Venue:
  • Computer Speech and Language
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

One way of making speech recognisers more robust to noise is model compensation. Rather than enhancing the incoming observations, model compensation techniques modify a recogniser's state-conditional distributions so they model the speech in the target environment. Because the interaction between speech and noise is non-linear, even for Gaussian speech and noise the corrupted speech distribution has no closed form. Thus, model compensation methods approximate it with a parametric distribution, such as a Gaussian or a mixture of Gaussians. The impact of this approximation has never been quantified. This paper therefore introduces a non-parametric method to compute the likelihood of a corrupted speech observation. It uses sampling and, given speech and noise distributions and a mismatch function, is exact in the limit. It therefore gives a theoretical bound for model compensation. Though computing the likelihood is computationally expensive, the novel method enables a performance comparison based on the criterion that model compensation methods aim to minimise: the KL divergence to the ideal compensation. It gives the point where the Kullback-Leibler (KL) divergence is zero. This paper examines the performance of various compensation methods, such as vector Taylor series (VTS) and data-driven parallel model combination (DPMC). It shows that more accurate modelling than Gaussian-for-Gaussian compensation improves the performance of speech recognition.