Reconstruction of missing features by means of multivariate Laplace distribution (MLD) for noise robust speech recognition

Authors:
Arash Mohammadi;Farshad Almasganj
Affiliations:
Biomedical Engineering Department, Amirkabir University of Technology, Tehran, Iran;Biomedical Engineering Department, Amirkabir University of Technology, Tehran, Iran
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 5
Cited 1

Description and generation of spherically invariant speech-model signals

Signal Processing
A Theory for Multiresolution Signal Decomposition: The Wavelet Representation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Assessment for automatic speech recognition II: NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems

Speech Communication - Special issue on speech processing in adverse conditions
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Multivariate scale mixture of gaussians modeling

ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation

Real-time frequency-based noise-robust Automatic Speech Recognition using Multi-Nets Artificial Neural Networks: A multi-views multi-learners approach

Neurocomputing

Quantified Score

Hi-index	12.05

Visualization

Abstract

Speech recognition accuracy degrades in presence of additive noise, especially when recognizer's training data is clean. Several methods have been proposed to compensate effects of noise on recognition accuracy. Among these methods, Missing Feature Techniques (MFT) have shown promising results. Two different MF approaches have been introduced in literature: ''Model-Based'' and ''Feature-Based'' approaches. In the first category, the state distribution calculations should be changed and also some modifications are required to cope with filter bank features. But, in the second category, compensated representations of corrupted signals are reconstructed prior to recognition, and conventional recognizers, using MFCC features, are then used. In ''Feature-Based'' MFT, spectral vectors of speech signal frames are conventionally modeled by a Gaussian distribution (GD) and according to estimated parameters of the models, missed parts of speech representation are reconstructed. In this paper, we consider some researches that suggest multivariate Laplace distribution (MLD) to be a proper distribution for modeling speech signal. Here, we examine this idea in modeling log spectral representation of speech frames, and show that MLD acts better than Gaussian distribution. Moreover, We apply the Maximum Likelihood (ML) estimation of missing elements conditioned on observed values with respect to MLD and prove that the estimation equations are simple and tractable and by using this estimation in reconstruction of missing features, we gain better phoneme recognition accuracy against using ''GD'', in noisy conditions. In SNR values blew 10dB in the cases of all of the noises, MLD improves the recognition accuracy more than 4% in most of the cases.