Uncertainty-based active learning with instability estimation for text classification

Authors:
Jingbo Zhu;Matthew Ma
Affiliations:
Northeastern University, China;Scientific Works, Princeton, NJ
Venue:
ACM Transactions on Speech and Language Processing (TSLP)
Year:
2012

Citing 38
Cited 0

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
A maximum entropy approach to natural language processing

Computational Linguistics
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Boosting the margin: A new explanation for the effectiveness of voting methods

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Active Learning for Natural Language Parsing and Information Extraction

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Word-sense disambiguation using decomposable models

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Integrating multiple knowledge sources to disambiguate word sense: an exemplar-based approach

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Online Choice of Active Learning Algorithms

The Journal of Machine Learning Research
Active learning using pre-clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Active learning for statistical natural language parsing

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Rule writing or annotation: cost-efficient resource usage for base noun phrase chunking

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Corpus-based statistical sense resolution

HLT '93 Proceedings of the workshop on Human Language Technology
Sample selection for statistical grammar induction

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
An empirical evaluation of knowledge sources and learning algorithms for word sense disambiguation

EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10
Multi-criteria-based active learning for named entity recognition

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
An empirical study of the behavior of active learning for word sense disambiguation

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Active learning for logistic regression: an evaluation

Machine Learning
Identification, classification, and analysis of opinions on the web

Identification, classification, and analysis of opinions on the web
A stopping criterion for active learning

Computer Speech and Language
A Density-Based Re-ranking Technique for Active Learning for Data Annotations

ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Assessing the costs of sampling methods in active learning for annotation

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A method for stopping active learning based on stabilizing predictions and the need for user-adjustable stopping

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Multi-criteria-based strategy to stop active learning for data annotation

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Active learning with sampling by uncertainty and density for word sense disambiguation and text classification

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
An analysis of active learning strategies for sequence labeling tasks

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Reducing labeling effort for structured prediction tasks

AAAI'05 Proceedings of the 20th national conference on Artificial intelligence - Volume 2
Active learning with statistical models

Journal of Artificial Intelligence Research
Active learning for part-of-speech tagging: accelerating corpus annotation

LAW '07 Proceedings of the Linguistic Annotation Workshop
A two-stage method for active learning of statistical grammars

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Active learning for multilingual statistical machine translation

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
How well does active learning actually work?: Time-based evaluation of cost-reduction strategies for language documentation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Confidence-based stopping criteria for active learning for data annotation

ACM Transactions on Speech and Language Processing (TSLP)
Active learning and semi-supervised learning for speech recognition: A unified framework using the global entropy reduction maximization criterion

Computer Speech and Language
Margin-Based active learning for structured output spaces

ECML'06 Proceedings of the 17th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article deals with pool-based active learning with uncertainty sampling. While existing uncertainty sampling methods emphasize selection of instances near the decision boundary to increase the likelihood of selecting informative examples, our position is that this heuristic is a surrogate for selecting examples for which the current learning algorithm iteration is likely to misclassify. To more directly model this intuition, this article augments such uncertainty sampling methods and proposes a simple instability-based selective sampling approach to improving uncertainty-based active learning, in which the instability degree of each unlabeled example is estimated during the learning process. Experiments on seven evaluation datasets show that instability-based sampling methods can achieve significant improvements over the traditional uncertainty sampling method. In terms of the average percentage of actively selected examples required for the learner to achieve 99% of its performance when training on the entire dataset, instability sampling and sampling by instability and density methods achieve better effectiveness in annotation cost reduction than random sampling and traditional entropy-based uncertainty sampling. Our experimental results have also shown that instability-based methods yield no significant improvement for active learning with SVMs when a popular sigmoidal function is used to transform SVM outputs to posterior probabilities.