Partially Supervised Text Classification: Combining Labeled and Unlabeled Documents Using an EM-like Scheme

Authors:
Carsten Lanquillon
Affiliations:
-
Venue:
ECML '00 Proceedings of the 11th European Conference on Machine Learning
Year:
2000

Citing 9
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Inductive learning of characteristic concept descriptions from small sets of classified examples

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Inductive learning algorithms and representations for text categorization

Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
A Comparative Study on Feature Selection in Text Categorization

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning

Learning from Labeled and Unlabeled Documents: A Comparative Study on Semi-Supervised Text Classification

PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, in many text classification tasks, labeled training documents are expensive to obtain, while unlabeled documents are readily available in large quantities. This paper describes a general framework for extending any text learning algorithm to utilize unlabeled documents in addition to labeled document using an Expectation-Maximization-like scheme. Our instantiation of this partially supervised classification framework with a similarity-based single prototype classifier achieves encouraging results on two real-world text datasets. Classification accuracy is reduced by up to 38% when using unlabeled documents in addition to labeled documents.