Transfer learning for tandem ASR feature extraction

Authors:
Joe Frankel;Özgür Çetin;Nelson Morgan
Affiliations:
University of Edinburgh and International Computer Science Institute;International Computer Science Institute;International Computer Science Institute
Venue:
MLMI'07 Proceedings of the 4th international conference on Machine learning for multimodal interaction
Year:
2007

Citing 2
Cited 2

Multitask Learning

Machine Learning - Special issue on inductive transfer
The ICSI-SRI spring 2006 meeting recognition system

MLMI'06 Proceedings of the Third international conference on Machine Learning for Multimodal Interaction

The efficient incorporation of MLP features into automatic speech recognition systems

Computer Speech and Language
Improving articulatory feature and phoneme recognition using multitask learning

ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Tandem automatic speech recognition (ASR), in which one or an ensemble of multi-layer perceptrons (MLPs) is used to provide a non-linear transform of the acoustic parameters, has become a standard technique in a number of state-of-the-art systems. In this paper, we examine the question of how to transfer learning from out-of-domain data to new tasks. Our primary focus is to develop tandem features for recognition of speech from the meetings domain. We show that adapting MLPs originally trained on conversational telephone speech leads to lower word error rates than training MLPs solely on the target data. Multitask learning, in which a single MLP is trained to perform a secondary task (in this case a speech enhancement mapping from farfield to nearfield signals) is also shown to be advantageous. We also present recognition experiments on broadcast news data which suggest that structure learned from English speech can be adapted toMandarin Chinese. The performance of tandem MLPs trained on 440 hours of Mandarin speech with a random initialization was achieved by adapted MLPs using about 97 hours of data in the target language.