Description and generation of spherically invariant speech-model signals
Signal Processing
A Theory for Multiresolution Signal Decomposition: The Wavelet Representation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Speech Communication - Special issue on speech processing in adverse conditions
Robust automatic speech recognition with missing and unreliable acoustic data
Speech Communication
Multivariate scale mixture of gaussians modeling
ICA'06 Proceedings of the 6th international conference on Independent Component Analysis and Blind Signal Separation
Hi-index | 12.05 |
Speech recognition accuracy degrades in presence of additive noise, especially when recognizer's training data is clean. Several methods have been proposed to compensate effects of noise on recognition accuracy. Among these methods, Missing Feature Techniques (MFT) have shown promising results. Two different MF approaches have been introduced in literature: ''Model-Based'' and ''Feature-Based'' approaches. In the first category, the state distribution calculations should be changed and also some modifications are required to cope with filter bank features. But, in the second category, compensated representations of corrupted signals are reconstructed prior to recognition, and conventional recognizers, using MFCC features, are then used. In ''Feature-Based'' MFT, spectral vectors of speech signal frames are conventionally modeled by a Gaussian distribution (GD) and according to estimated parameters of the models, missed parts of speech representation are reconstructed. In this paper, we consider some researches that suggest multivariate Laplace distribution (MLD) to be a proper distribution for modeling speech signal. Here, we examine this idea in modeling log spectral representation of speech frames, and show that MLD acts better than Gaussian distribution. Moreover, We apply the Maximum Likelihood (ML) estimation of missing elements conditioned on observed values with respect to MLD and prove that the estimation equations are simple and tractable and by using this estimation in reconstruction of missing features, we gain better phoneme recognition accuracy against using ''GD'', in noisy conditions. In SNR values blew 10dB in the cases of all of the noises, MLD improves the recognition accuracy more than 4% in most of the cases.