On proper unit selection in active learning: co-selection effects for named entity recognition

Authors:
Katrin Tomanek;Florian Laws;Udo Hahn;Hinrich Schütze
Affiliations:
Friedrich-Schiller-Universität Jena, Jena, Germany;Universität Stuttgart, Germany;Friedrich-Schiller-Universität Jena, Jena, Germany;Universität Stuttgart, Germany
Venue:
HLT '09 Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing
Year:
2009

Citing 10
Cited 3

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Performance thresholding in practical text classification

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Active learning for logistic regression: an evaluation

Machine Learning
Hierarchical sampling for active learning

Proceedings of the 25th international conference on Machine learning
Active learning for object classification: from exploration to exploitation

Data Mining and Knowledge Discovery
An analysis of active learning strategies for sequence labeling tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Active learning with statistical models

Journal of Artificial Intelligence Research
Active learning for part-of-speech tagging: accelerating corpus annotation

LAW '07 Proceedings of the Linguistic Annotation Workshop
Investigating the effects of selective sampling on the annotation task

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning

Reducing class imbalance during active learning for named entity annotation

Proceedings of the fifth international conference on Knowledge capture
Good seed makes a good crop: accelerating active learning using language modeling

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Active learning for coreference resolution

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Active learning is an effective method for creating training sets cheaply, but it is a biased sampling process and fails to explore large regions of the instance space in many applications. This can result in a missed cluster effect, which signficantly lowers recall and slows down learning for infrequent classes. We show that missed clusters can be avoided in sequence classification tasks by using sentences as natural multi-instance units for labeling. Co-selection of other tokens within sentences provides an implicit exploratory component since we found for the task of named entity recognition on two corpora that entity classes co-occur with sufficient frequency within sentences.