Cross-Lingual Subspace Gaussian Mixture Models for Low-Resource Speech Recognition

Authors:
Liang Lu;Arnab Ghoshal;Steve Renals
Affiliations:
Sch. of Inf., Univ. of Edinburgh, Edinburgh, UK;Sch. of Inf., Univ. of Edinburgh, Edinburgh, UK;Sch. of Inf., Univ. of Edinburgh, Edinburgh, UK
Venue:
IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)
Year:
2014

Citing 6
Cited 0

Language-independent and language-adaptive acoustic modeling for speech recognition

Speech Communication
Towards language independent acoustic modeling

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
The subspace Gaussian mixture model-A structured model for speech recognition

Computer Speech and Language
Compressed sensing

IEEE Transactions on Information Theory
Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Experiments on Cross-Language Attribute Detection and Phone Recognition With Minimal Target-Specific Training Data

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies cross-lingual acoustic modeling in the context of subspace Gaussian mixture models (SGMMs). SGMMs factorize the acoustic model parameters into a set that is globally shared between all the states of a hidden Markov model (HMM) and another that is specific to the HMM states. We demonstrate that the SGMM global parameters are transferable between languages, particularly when the parameters are trained multilingually. As a result, acoustic models may be trained using limited amounts of transcribed audio by borrowing the SGMM global parameters from one or more source languages, and only training the state-specific parameters on the target language audio. Model regularization using ℓ1-norm penalty is shown to be particularly effective at avoiding overtraining and leading to lower word error rates. We investigate maximum a posteriori (MAP) adaptation of subspace parameters in order to reduce the mismatch between the SGMM global parameters of the source and target languages. In addition, monolingual and cross-lingual speaker adaptive training is used to reduce the model variance introduced by speakers. We have systematically evaluated these techniques by experiments on the GlobalPhone corpus.