Uncertainty-based learning of acoustic models from noisy data

Authors:
Alexey Ozerov;Mathieu Lagrange;Emmanuel Vincent
Affiliations:
Technicolor Research & Innovation, France;STMS - IRCAM - CNRS - UPMC, France;INRIA, Centre de Rennes - Bretagne Atlantique, France
Venue:
Computer Speech and Language
Year:
2013

Citing 11
Cited 1

Filtering the time sequences of spectral parameters for speech recognition

Speech Communication
Robust automatic speech recognition with missing and unreliable acoustic data

Speech Communication
Broadband Beamforming with Adaptive Postfiltering for Speech Acquisition in Noisy Environments

ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
A vector Taylor series approach for environment-independent speech recognition

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A computational auditory scene analysis system for speech segregation and robust speech recognition

Computer Speech and Language
Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing

IEEE Transactions on Audio, Speech, and Language Processing
Independent component analysis and time-frequency masking for speech recognition in multitalker conditions

EURASIP Journal on Audio, Speech, and Music Processing
A unifying view on dataset shift in classification

Pattern Recognition
Transforming Binary Uncertainties for Robust Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Adaptation of Bayesian Models for Single-Channel Source Separation and its Application to Voice/Music Separation in Popular Songs

IEEE Transactions on Audio, Speech, and Language Processing
The PASCAL CHiME speech separation and recognition challenge

Computer Speech and Language

Estimating Uncertainty to Improve Exemplar-Based Feature Enhancement for Noise Robust Speech Recognition

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of acoustic modeling of noisy speech data, where the uncertainty over the data is given by a Gaussian distribution. While this uncertainty has been exploited at the decoding stage via uncertainty decoding, its usage at the training stage remains limited to static model adaptation. We introduce a new expectation maximization (EM) based technique, which we call uncertainty training, that allows us to train Gaussian mixture models (GMMs) or hidden Markov models (HMMs) directly from noisy data with dynamic uncertainty. We evaluate the potential of this technique for a GMM-based speaker recognition task on speech data corrupted by real-world domestic background noise, using a state-of-the-art signal enhancement technique and various uncertainty estimation techniques as a front-end. Compared to conventional training, the proposed training algorithm results in 3-4% absolute improvement in speaker recognition accuracy by training from either matched, unmatched or multi-condition noisy data. This algorithm is also applicable with minor modifications to maximum a posteriori (MAP) or maximum likelihood linear regression (MLLR) acoustic model adaptation from noisy data and to other data than audio.