Active learning for classifying phone sequences from unsupervised phonotactic models

Authors:
Shona Douglas
Affiliations:
AT&T Labs - Research, NJ
Venue:
NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Year:
2003

Citing 3
Cited 1

BoosTexter: A Boosting-based Systemfor Text Categorization

Machine Learning - Special issue on information retrieval
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Effective utterance classification with unsupervised phonotactic models

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1

Automatic discovery of topics and acoustic morphemes from speech

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an application of active learning methods to the classification of phone strings recognized using unsupervised phonotactic models. The only training data required for classification using these recognition methods is assigning class labels to the audio files. The work described here demonstrates that substantial savings in this effort can be obtained by actively selecting examples to be labeled using confidence scores from the Boos-Texter classifier. The saving in class labeling effort is evaluated on two different spoken language system domains in terms both of the number of utterances to be labeled and the length of the labeled utterances in phones. We show that savings in labeling effort of around 30% can be obtained using active selection of examples.