Effective multi-label active learning for text classification

Authors:
Bishan Yang;Jian-Tao Sun;Tengjiao Wang;Zheng Chen
Affiliations:
Peking University, Beijing, China;Microsoft Research Asia, Beijing, China;Peking University, Beijing, China;Microsoft Research Asia, Beijing, China
Venue:
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2009

Citing 14
Cited 11

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Support-Vector Networks

Machine Learning
A study of thresholding strategies for text categorization

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Query Learning with Large Margin Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Single-shot detection of multiple categories of text using parametric mixture models

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Active learning: theory and applications

Active learning: theory and applications
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Automatically Labeling Video Data Using Multi-class Active Learning

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
RCV1: A New Benchmark Collection for Text Categorization Research

The Journal of Machine Learning Research
Active Learning to Recognize Multiple Types of Plankton

The Journal of Machine Learning Research
A note on Platt's probabilistic outputs for support vector machines

Machine Learning

PinDr0p: using single-ended audio features to determine call provenance

Proceedings of the 17th ACM conference on Computer and communications security
Dual active feature and sample selection for graph classification

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-label ensemble learning

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
A weakly-supervised approach to argumentative zoning of scientific documents

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Capturing correlations of multiple labels: A generative probabilistic model for multi-label learning

Neurocomputing
Active learning for hierarchical text classification

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Unsupervised multi-label text classification using a world knowledge ontology

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Semantic Labelling for Document Feature Patterns Using Ontological Subjects

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Mapping semantic knowledge for unsupervised text categorisation

ADC '13 Proceedings of the Twenty-Fourth Australasian Database Conference - Volume 137
Active learning with multi-label SVM classification

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Scaling short-answer grading by combining peer assessment with algorithmic scoring

Proceedings of the first ACM conference on Learning @ scale conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Labeling text data is quite time-consuming but essential for automatic text classification. Especially, manually creating multiple labels for each document may become impractical when a very large amount of data is needed for training multi-label text classifiers. To minimize the human-labeling efforts, we propose a novel multi-label active learning approach which can reduce the required labeled data without sacrificing the classification accuracy. Traditional active learning algorithms can only handle single-label problems, that is, each data is restricted to have one label. Our approach takes into account the multi-label information, and select the unlabeled data which can lead to the largest reduction of the expected model loss. Specifically, the model loss is approximated by the size of version space, and the reduction rate of the size of version space is optimized with Support Vector Machines (SVM). An effective label prediction method is designed to predict possible labels for each unlabeled data point, and the expected loss for multi-label data is approximated by summing up losses on all labels according to the most confident result of label prediction. Experiments on several real-world data sets (all are publicly available) demonstrate that our approach can obtain promising classification result with much fewer labeled data than state-of-the-art methods.