Reconstruction of missing features by means of multivariate Laplace distribution (MLD) for noise robust speech recognition

  • Authors:
  • Arash Mohammadi;Farshad Almasganj

  • Affiliations:
  • Biomedical Engineering Department, Amirkabir University of Technology, Tehran, Iran;Biomedical Engineering Department, Amirkabir University of Technology, Tehran, Iran

  • Venue:
  • Expert Systems with Applications: An International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 12.05

Visualization

Abstract

Speech recognition accuracy degrades in presence of additive noise, especially when recognizer's training data is clean. Several methods have been proposed to compensate effects of noise on recognition accuracy. Among these methods, Missing Feature Techniques (MFT) have shown promising results. Two different MF approaches have been introduced in literature: ''Model-Based'' and ''Feature-Based'' approaches. In the first category, the state distribution calculations should be changed and also some modifications are required to cope with filter bank features. But, in the second category, compensated representations of corrupted signals are reconstructed prior to recognition, and conventional recognizers, using MFCC features, are then used. In ''Feature-Based'' MFT, spectral vectors of speech signal frames are conventionally modeled by a Gaussian distribution (GD) and according to estimated parameters of the models, missed parts of speech representation are reconstructed. In this paper, we consider some researches that suggest multivariate Laplace distribution (MLD) to be a proper distribution for modeling speech signal. Here, we examine this idea in modeling log spectral representation of speech frames, and show that MLD acts better than Gaussian distribution. Moreover, We apply the Maximum Likelihood (ML) estimation of missing elements conditioned on observed values with respect to MLD and prove that the estimation equations are simple and tractable and by using this estimation in reconstruction of missing features, we gain better phoneme recognition accuracy against using ''GD'', in noisy conditions. In SNR values blew 10dB in the cases of all of the noises, MLD improves the recognition accuracy more than 4% in most of the cases.