d-Confidence: an active learning strategy which efficiently identifies small classes

Authors:
Nuno Escudeiro;Alípio Jorge
Affiliations:
Instituto Superior de Engenharia do Porto, Porto, Portugal;LIAAD-INESC PORTO L.A., Porto, Portugal
Venue:
ALNLP '10 Proceedings of the NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing
Year:
2010

Citing 18
Cited 0

Training connectionist networks with queries and selective sampling

Advances in neural information processing systems 2
Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Improving Generalization with Active Learning

Machine Learning - Special issue on structured connectionist systems
Instance Selection and Construction for Data Mining

Instance Selection and Construction for Data Mining
Queries and Concept Learning

Machine Learning
Queries and Concept Learning

Machine Learning
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Clustering documents into a web directory for bootstrapping a supervised classification

Data & Knowledge Engineering - Special issue: WIDM 2003
Large-scale text categorization by batch mode active learning

Proceedings of the 15th international conference on World Wide Web
Agnostic active learning

ICML '06 Proceedings of the 23rd international conference on Machine learning
Confidence-Based Active Learning

IEEE Transactions on Pattern Analysis and Machine Intelligence
A bound on the label complexity of agnostic active learning

Proceedings of the 24th international conference on Machine learning
Hierarchical sampling for active learning

Proceedings of the 25th international conference on Machine learning
Active learning with multiple views

Journal of Artificial Intelligence Research
Active learning with statistical models

Journal of Artificial Intelligence Research
Efficient Coverage of Case Space with Active Learning

EPIA '09 Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Active learning in the non-realizable case

ALT'06 Proceedings of the 17th international conference on Algorithmic Learning Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

In some classification tasks, such as those related to the automatic building and maintenance of text corpora, it is expensive to obtain labeled examples to train a classifier. In such circumstances it is common to have massive corpora where a few examples are labeled (typically a minority) while others are not. Semi-supervised learning techniques try to leverage the intrinsic information in unlabeled examples to improve classification models. However, these techniques assume that the labeled examples cover all the classes to learn which might not stand. In the presence of an imbalanced class distribution getting labeled examples from minority classes might be very costly if queries are randomly selected. Active learning allows asking an oracle to label new examples, that are criteriously selected, and does not assume a previous knowledge of all classes. D-Confidence is an active learning approach that is effective when in presence of imbalanced training sets. In this paper we discuss the performance of d-Confidence over text corpora. We show empirically that d-Confidence reduces the number of queries required to identify examples from all classes to learn when compared to confidence, a common active learning criterion.