A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Performance thresholding in practical text classification
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Active learning for logistic regression: an evaluation
Machine Learning
Hierarchical sampling for active learning
Proceedings of the 25th international conference on Machine learning
Active learning for object classification: from exploration to exploitation
Data Mining and Knowledge Discovery
An analysis of active learning strategies for sequence labeling tasks
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Active learning with statistical models
Journal of Artificial Intelligence Research
Active learning for part-of-speech tagging: accelerating corpus annotation
LAW '07 Proceedings of the Linguistic Annotation Workshop
Investigating the effects of selective sampling on the annotation task
CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
Reducing class imbalance during active learning for named entity annotation
Proceedings of the fifth international conference on Knowledge capture
Good seed makes a good crop: accelerating active learning using language modeling
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Active learning for coreference resolution
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Hi-index | 0.00 |
Active learning is an effective method for creating training sets cheaply, but it is a biased sampling process and fails to explore large regions of the instance space in many applications. This can result in a missed cluster effect, which signficantly lowers recall and slows down learning for infrequent classes. We show that missed clusters can be avoided in sequence classification tasks by using sentences as natural multi-instance units for labeling. Co-selection of other tokens within sentences provides an implicit exploratory component since we found for the task of named entity recognition on two corpora that entity classes co-occur with sufficient frequency within sentences.