Fusion of parametric and non-parametric approaches to noise-robust ASR

Authors:
Yang Sun;Jort F. Gemmeke;Bert Cranen;Louis Ten Bosch;Lou Boves
Affiliations:
-;-;-;-;-
Venue:
Speech Communication
Year:
2014

Citing 9
Cited 0

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Towards increasing speech recognition error rates

Speech Communication
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Multistream Articulatory Feature-Based Models for Visual Speech Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Conditional Random Fields for Integrating Local Discriminative Classifiers

IEEE Transactions on Audio, Speech, and Language Processing
Extended VTS for Noise-Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
MVA Processing of Speech Features

IEEE Transactions on Audio, Speech, and Language Processing
Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a principled method for the fusion of independent estimates of the state likelihood in a Dynamic Bayesian Network (DBN) by means of the Virtual Evidence option for improving speech recognition in the aurora-2 task. A first estimate is derived from a conventional parametric Gaussian Mixture Model; a second estimate is obtained from a non-parametric Sparse Classification (SC) system. During training the parameters pertaining to the input streams can be optimized independently, but also jointly, provided that all streams represent true probability functions. During decoding the weights of the streams can be varied much more freely. It appeared that the state likelihoods in the GMM and SC streams are very different, and that this makes it necessary to apply different weights to the streams in decoding. When using optimal weights, the dual-input system can outperform the individual GMM or the SC systems for all SNR levels in test sets A and B in the aurora-2 task.