A sequential algorithm for training text classifiers
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory
The nature of statistical learning theory
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A re-examination of text categorization methods
Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PAC Learning from Positive Statistical Queries
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
PEBL: positive example based learning for Web page classification using SVM
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
General MC: Estimating Boundary of Positive Class from Small Positive Data
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Building Text Classifiers Using Positive and Unlabeled Examples
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Cross-training: learning probabilistic mappings between topics
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A needle in a haystack: local one-class optimization
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Improving SVM accuracy by training on auxiliary data sources
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Authorship verification as a one-class classification problem
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning to classify texts using positive and unlabeled data
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Efficient learning of Naive Bayes classifiers under class-conditional classification noise
ICML '06 Proceedings of the 23rd international conference on Machine learning
Classification techniques with minimal labelling effort and application to medical reports
International Journal of Data Mining and Bioinformatics
Building a Text Classifier by a Keyword and Unlabeled Documents
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Building a Text Classifier by a Keyword and Wikipedia Knowledge
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
A framework for modeling positive class expansion with single snapshot
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Clustering objects from multiple collections
KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Negative training data can be harmful to text classification
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Editorial: Classifying text streams by keywords using classifier ensemble
Data & Knowledge Engineering
Toward supervised anomaly detection
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
We study the problem of learning from positive and unlabeled examples. Although several techniques exist for dealing with this problem, they all assume that positive examples in the positive set P and the positive examples in the unlabeled set U are generated from the same distribution. This assumption may be violated in practice. For example, one wants to collect all printer pages from the Web. One can use the printer pages from one site as the set P of positive pages and use product pages from another site as U. One wants to classify the pages in U into printer pages and non-printer pages. Although printer pages from the two sites have many similarities, they can also be quite different because different sites often present similar products in different styles and have different focuses. In such cases, existing methods perform poorly. This paper proposes a novel technique A-EM to deal with the problem. Experiment results with product page classification demonstrate the effectiveness of the proposed technique.