Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Automatic text processing: the transformation, analysis, and retrieval of information by computer
Inductive learning of characteristic concept descriptions from small sets of classified examples
ECML-94 Proceedings of the European conference on machine learning on Machine Learning
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Inductive learning algorithms and representations for text categorization
Proceedings of the seventh international conference on Information and knowledge management
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
A Comparative Study on Feature Selection in Text Categorization
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
PKDD '00 Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Supervised learning algorithms usually require large amounts of training data to learn reasonably accurate classifiers. Yet, in many text classification tasks, labeled training documents are expensive to obtain, while unlabeled documents are readily available in large quantities. This paper describes a general framework for extending any text learning algorithm to utilize unlabeled documents in addition to labeled document using an Expectation-Maximization-like scheme. Our instantiation of this partially supervised classification framework with a similarity-based single prototype classifier achieves encouraging results on two real-world text datasets. Classification accuracy is reduced by up to 38% when using unlabeled documents in addition to labeled documents.