Learning to Classify Documents with Only a Small Positive Training Set

Authors:
Xiao-Li Li;Bing Liu;See-Kiong Ng
Affiliations:
Institute for Infocomm Research, Heng Mui Keng Terrace, 119613, Singapore;Department of Computer Science, University of Illinois at Chicago, IL 60607-7053,;Institute for Infocomm Research, Heng Mui Keng Terrace, 119613, Singapore
Venue:
ECML '07 Proceedings of the 18th European conference on Machine Learning
Year:
2007

Citing 12
Cited 5

A sequential algorithm for training text classifiers: corrigendum and additional data

ACM SIGIR Forum
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Learning from Positive Data

ILP '96 Selected Papers from the 6th International Workshop on Inductive Logic Programming
PAC Learning from Positive Statistical Queries

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
General MC: Estimating Boundary of Positive Class from Small Positive Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Eliminating noisy information in Web pages for data mining

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Building a Text Classifier by a Keyword and Unlabeled Documents

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Building a Text Classifier by a Keyword and Wikipedia Knowledge

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Editorial: Classifying text streams by keywords using classifier ensemble

Data & Knowledge Engineering
Similarity-based approach for positive and unlabelled learning

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Building high-performance classifiers using positive and unlabeled examples for text classification

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many real-world classification applications fall into the class of positive and unlabeled (PU) learning problems. In many such applications, not only could the negative training examples be missing, the number of positive examples available for learning may also be fairly limited due to the impracticality of hand-labeling a large number of training examples. Current PU learning techniques have focused mostly on identifying reliable negative instances from the unlabeled set U. In this paper, we address the oft-overlooked PU learning problem when the number of training examples in the positive set Pis small. We propose a novel technique LPLP (Learning from Probabilistically Labeled Positive examples) and apply the approach to classify product pages from commercial websites. The experimental results demonstrate that our approach outperforms existing methods significantly, even in the challenging cases where the positive examples in Pand the hidden positive examples in Uwere not drawn from the same distribution.