Unsupervised training and directed manual transcription for LVCSR

Authors:
Kai Yu;Mark Gales;Lan Wang;Philip C. Woodland
Affiliations:
Machine Intelligence Lab, Cambridge University Engineering Department, Cambridge CB2 1PZ, UK;Machine Intelligence Lab, Cambridge University Engineering Department, Cambridge CB2 1PZ, UK;Machine Intelligence Lab, Cambridge University Engineering Department, Cambridge CB2 1PZ, UK;Machine Intelligence Lab, Cambridge University Engineering Department, Cambridge CB2 1PZ, UK
Venue:
Speech Communication
Year:
2010

Citing 5
Cited 1

Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition

Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition
Finding consensus in speech recognition

Finding consensus in speech recognition
Selective sampling of training data for speech recognition

HLT '02 Proceedings of the second international conference on Human Language Technology Research
Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance

Computer Speech and Language

Integrating imperfect transcripts into speech recognition systems for building high-quality corpora

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

A significant cost in obtaining acoustic training data is the generation of accurate transcriptions. When no transcription is available, unsupervised training techniques must be used. Furthermore, the use of discriminative training has become a standard feature of state-of-the-art large vocabulary continuous speech recognition (LVCSR) system. In unsupervised training, unlabelled data are recognised using a seed model and the hypotheses from the recognition system are used as transcriptions for training. In contrast to maximum likelihood training, the performance of discriminative training is more sensitive to the quality of the transcriptions. One approach to deal with this issue is data selection, where only well recognised data are selected for training. More effectively, as the key contribution of this work, an active learning technique, directed manual transcription, can be used. Here a relatively small amount of poorly recognised data is manually transcribed to supplement the automatic transcriptions. Experiments show that using the data selection approach for discriminative training yields disappointing performance improvement on the data which is mismatched to the training data type of the seed model. However, using the directed manual transcription approach can yield significant improvements in recognition accuracy on all types of data.