Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion

Authors:
Dong Yu;Balakrishnan Varadarajan;Li Deng;Alex Acero
Affiliations:
Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA;Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA;Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA;Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA
Venue:
Computer Speech and Language
Year:
2010

Citing 3
Cited 7

Active learning for acoustic speech recognition modeling

Active learning for acoustic speech recognition modeling
Active learning for Hidden Markov Models: objective functions and algorithms

ICML '05 Proceedings of the 22nd international conference on Machine learning
Variational Bayes for Continuous Hidden Markov Models and Its Application to Active Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence

An effective procedure exploiting unlabeled data to build monitoring system

Expert Systems with Applications: An International Journal
Uncertainty-based active learning with instability estimation for text classification

ACM Transactions on Speech and Language Processing (TSLP)
Iterative refinement of HMM and HCRF for sequence classification

PSL'11 Proceedings of the First IAPR TC3 conference on Partially Supervised Learning
An adaptive approach with active learning in software fault prediction

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
Active graph matching based on pairwise probabilities between nodes

SSPR'12/SPR'12 Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Combining active learning and semi-supervised learning to construct SVM classifier

Knowledge-Based Systems
Keyword spotting for self-training of BLSTM NN based handwriting recognition systems

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a unified global entropy reduction maximization (GERM) framework for active learning and semi-supervised learning for speech recognition. Active learning aims to select a limited subset of utterances for transcribing from a large amount of un-transcribed utterances, while semi-supervised learning addresses the problem of selecting right transcriptions for un-transcribed utterances, so that the accuracy of the automatic speech recognition system can be maximized. We show that both the traditional confidence-based active learning and semi-supervised learning approaches can be improved by maximizing the lattice entropy reduction over the whole dataset. We introduce our criterion and framework, show how the criterion can be simplified and approximated, and describe how these approaches can be combined. We demonstrate the effectiveness of our new framework and algorithm with directory assistance data collected under the real usage scenarios and show that our GERM based active learning and semi-supervised learning algorithms consistently outperform the confidence-based counterparts by a significant margin. Using our new active learning algorithm cuts the number of utterances needed for transcribing by 50% to achieve the same recognition accuracy obtained using the confidence-based active learning approach, and by 60% compared to the random sampling approach. Using our new semi-supervised algorithm we can determine the cutoff point in determining which utterance-transcription pair to use in a principled way by demonstrating that the point it finds is very close to the achievable peak point.