Using out-of-language data to improve an under-resourced speech recognizer

Authors:
David Imseng;Petr Motlicek;Hervé Bourlard;Philip N. Garner
Affiliations:
Idiap Research Institute, Martigny, Switzerland and Ecole Polytechnique Fédérale, Lausanne (EPFL), Switzerland;Idiap Research Institute, Martigny, Switzerland;Idiap Research Institute, Martigny, Switzerland and Ecole Polytechnique Fédérale, Lausanne (EPFL), Switzerland;Idiap Research Institute, Martigny, Switzerland
Venue:
Speech Communication
Year:
2014

Citing 3
Cited 0

Language-independent and language-adaptive acoustic modeling for speech recognition

Speech Communication
Language-dependent state clustering for multilingual acoustic modelling

Speech Communication
Speaker adaptation based on MAP estimation of HMM parameters

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Under-resourced speech recognizers may benefit from data in languages other than the target language. In this paper, we report how to boost the performance of an Afrikaans automatic speech recognition system by using already available Dutch data. We successfully exploit available multilingual resources through (1) posterior features, estimated by multilayer perceptrons (MLP) and (2) subspace Gaussian mixture models (SGMMs). Both the MLPs and the SGMMs can be trained on out-of-language data. We use three different acoustic modeling techniques, namely Tandem, Kullback-Leibler divergence based HMMs (KL-HMM) as well as SGMMs and show that the proposed multilingual systems yield 12% relative improvement compared to a conventional monolingual HMM/GMM system only trained on Afrikaans. We also show that KL-HMMs are extremely powerful for under-resourced languages: using only six minutes of Afrikaans data (in combination with out-of-language data), KL-HMM yields about 30% relative improvement compared to conventional maximum likelihood linear regression and maximum a posteriori based acoustic model adaptation.