Representative sampling for text classification using support vector machines

Authors:
Zhao Xu;Kai Yu;Volker Tresp;Xiaowei Xu;Jizhi Wang
Affiliations:
Tsinghua University, Beijing, China;Institute for Computer Science, University of Munich, Germany;Corporate Technology, Siemens AG, Munich, Germany;University of Arkansas at Little Rock, Little Rock;Tsinghua University, Beijing, China
Venue:
ECIR'03 Proceedings of the 25th European conference on IR research
Year:
2003

Citing 10
Cited 21

Query by committee

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Relevance Feedback using Support Vector Machines

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Employing EM and Pool-Based Active Learning for Text Classification

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Support vector machine active learning with applications to text classification

The Journal of Machine Learning Research
Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics)

Active learning using pre-clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning concepts from large scale imbalanced data sets using support cluster machines

MULTIMEDIA '06 Proceedings of the 14th annual ACM international conference on Multimedia
Repairing self-confident active-transductive learners using systematic exploration

Pattern Recognition Letters
Optimizing estimated loss reduction for active sampling in rank learning

Proceedings of the 25th international conference on Machine learning
Dual Strategy Active Learning

ECML '07 Proceedings of the 18th European conference on Machine Learning
Actively Transfer Domain Knowledge

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Improving supervised learning performance by using fuzzy clustering method to select training data

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Fuzzy theory and technology with applications
Active learning for object classification: from exploration to exploitation

Data Mining and Knowledge Discovery
Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Active Learning Strategies for Multi-Label Text Classification

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Learning to segment from a few well-selected training images

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
A machine learning approach to sentiment analysis in multilingual Web texts

Information Retrieval
Optimistic active learning using mutual information

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Block-quantized support vector ordinal regression

IEEE Transactions on Neural Networks
A framework of automatic subject term assignment for text categorization: An indexing conception-based approach

Journal of the American Society for Information Science and Technology
Inactive learning?: difficulties employing active learning in practice

ACM SIGKDD Explorations Newsletter
Ask me better questions: active learning queries based on rule induction

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
EGAL: exploration guided active learning for TCBR

ICCBR'10 Proceedings of the 18th international conference on Case-Based Reasoning Research and Development
Active learning for hierarchical text classification

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Training pool selection for semi-supervised learning

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Querying discriminative and representative samples for batch mode active learning

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to reduce human efforts, there has been increasing interest in applying active learning for training text classifiers. This paper describes a straightforward active learning heuristic, representative sampling, which explores the clustering structure of 'uncertain' documents and identifies the representative samples to query the user opinions, for the purpose of speeding up the convergence of Support Vector Machine (SVM) classifiers. Compared with other active learning algorithms, the proposed representative sampling explicitly addresses the problem of selecting more than one unlabeled documents. In an empirical study we compared representative sampling both with random sampling and with SVM active learning. The results demonstrated that representative sampling offers excellent learning performance with fewer labeled documents and thus can reduce human efforts in text classification tasks.