Filtering the time sequences of spectral parameters for speech recognition
Speech Communication
Robust automatic speech recognition with missing and unreliable acoustic data
Speech Communication
Broadband Beamforming with Adaptive Postfiltering for Speech Acquisition in Noisy Environments
ICASSP '97 Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97) -Volume 1 - Volume 1
A vector Taylor series approach for environment-independent speech recognition
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 02
A computational auditory scene analysis system for speech segregation and robust speech recognition
Computer Speech and Language
IEEE Transactions on Audio, Speech, and Language Processing
EURASIP Journal on Audio, Speech, and Music Processing
A unifying view on dataset shift in classification
Pattern Recognition
Transforming Binary Uncertainties for Robust Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing
IEEE Transactions on Audio, Speech, and Language Processing
The PASCAL CHiME speech separation and recognition challenge
Computer Speech and Language
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Hi-index | 0.00 |
We consider the problem of acoustic modeling of noisy speech data, where the uncertainty over the data is given by a Gaussian distribution. While this uncertainty has been exploited at the decoding stage via uncertainty decoding, its usage at the training stage remains limited to static model adaptation. We introduce a new expectation maximization (EM) based technique, which we call uncertainty training, that allows us to train Gaussian mixture models (GMMs) or hidden Markov models (HMMs) directly from noisy data with dynamic uncertainty. We evaluate the potential of this technique for a GMM-based speaker recognition task on speech data corrupted by real-world domestic background noise, using a state-of-the-art signal enhancement technique and various uncertainty estimation techniques as a front-end. Compared to conventional training, the proposed training algorithm results in 3-4% absolute improvement in speaker recognition accuracy by training from either matched, unmatched or multi-condition noisy data. This algorithm is also applicable with minor modifications to maximum a posteriori (MAP) or maximum likelihood linear regression (MLLR) acoustic model adaptation from noisy data and to other data than audio.