Good seed makes a good crop: accelerating active learning using language modeling

Authors:
Dmitriy Dligach;Martha Palmer
Affiliations:
University of Colorado at Boulder;University of Colorado at Boulder
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 9
Cited 0

Concept-Learning in the Presence of Between-Class and Within-Class Imbalances

AI '01 Proceedings of the 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Mining with rarity: a unifying framework

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Performance thresholding in practical text classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
An empirical study of the behavior of active learning for word sense disambiguation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Active learning for logistic regression: an evaluation

Machine Learning
Novel semantic features for verb sense disambiguation

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
On proper unit selection in active learning: co-selection effects for named entity recognition

HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Using language modeling to select useful annotation data

SRWS '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Student Research Workshop and Doctoral Consortium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Active Learning (AL) is typically initialized with a small seed of examples selected randomly. However, when the distribution of classes in the data is skewed, some classes may be missed, resulting in a slow learning progress. Our contribution is twofold: (1) we show that an unsupervised language modeling based technique is effective in selecting rare class examples, and (2) we use this technique for seeding AL and demonstrate that it leads to a higher learning rate. The evaluation is conducted in the context of word sense disambiguation.