Active learning with sampling by uncertainty and density for data annotations

Authors:
Jingbo Zhu;Huizhen Wang;Benjamin K. Tsou;Matthew Ma
Affiliations:
Key Laboratory of Medical Image Computing, Ministry of Education, and the Natural Language Processing Lab, Northeastern University Shenyang, China;Key Laboratory of Medical Image Computing, Ministry of Education, and the Natural Language Processing Lab, Northeastern University Shenyang, China;Language Information Sciences Research Center, City University of Hong Kong, Hong Kong, China;Scientific Works, NJ
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 23
Cited 2

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
A maximum entropy approach to natural language processing

Computational Linguistics
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Boosting the margin: A new explanation for the effectiveness of voting methods

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Discriminative Reranking for Natural Language Parsing

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Word-sense disambiguation using decomposable models

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Online Choice of Active Learning Algorithms

The Journal of Machine Learning Research
Active learning for statistical natural language parsing

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Corpus-based statistical sense resolution

HLT '93 Proceedings of the workshop on Human Language Technology
An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Multi-criteria-based active learning for named entity recognition

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
An empirical study of the behavior of active learning for word sense disambiguation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Active learning for logistic regression: an evaluation

Machine Learning
A stopping criterion for active learning

Computer Speech and Language
Multi-criteria-based strategy to stop active learning for data annotation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Active learning with sampling by uncertainty and density for word sense disambiguation and text classification

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Distributed language modeling for N-best list re-ranking

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Active learning with statistical models

Journal of Artificial Intelligence Research

Literature survey of active learning in multimedia annotation and retrieval

Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service
A probabilistic model of active learning with multiple noisy oracles

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

To solve the knowledge bottleneck problem, active learning has been widely used for its ability to automatically select the most informative unlabeled examples for human annotation. One of the key enabling techniques of active learning is uncertainty sampling, which uses one classifier to identify unlabeled examples with the least confidence. Uncertainty sampling often presents problems when outliers are selected. To solve the outlier problem, this paper presents two techniques, sampling by uncertainty and density (SUD) and density-based re-ranking. Both techniques prefer not only the most informative example in terms of uncertainty criterion, but also the most representative example in terms of density criterion. Experimental results of active learning for word sense disambiguation and text classification tasks using six real-world evaluation data sets demonstrate the effectiveness of the proposed methods.