Learning from positive and unlabeled examples with different data distributions

Authors:
Xiao-Li Li;Bing Liu
Affiliations:
Institute for Infocomm Research, Singapore;Department of Computer Science, University of Illinois at Chicago, Chicago, IL
Venue:
ECML'05 Proceedings of the 16th European conference on Machine Learning
Year:
2005

Citing 18
Cited 9

A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A re-examination of text categorization methods

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Exploiting Relations Among Concepts to Acquire Weakly Labeled Training Data

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PAC Learning from Positive Statistical Queries

ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
PEBL: positive example based learning for Web page classification using SVM

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
General MC: Estimating Boundary of Positive Class from Small Positive Data

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Cross-training: learning probabilistic mappings between topics

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A needle in a haystack: local one-class optimization

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Improving SVM accuracy by training on auxiliary data sources

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Authorship verification as a one-class classification problem

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Learning to classify texts using positive and unlabeled data

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Efficient learning of Naive Bayes classifiers under class-conditional classification noise

ICML '06 Proceedings of the 23rd international conference on Machine learning
Classification techniques with minimal labelling effort and application to medical reports

International Journal of Data Mining and Bioinformatics
Building a Text Classifier by a Keyword and Unlabeled Documents

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Building a Text Classifier by a Keyword and Wikipedia Knowledge

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
A framework for modeling positive class expansion with single snapshot

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Clustering objects from multiple collections

KI'09 Proceedings of the 32nd annual German conference on Advances in artificial intelligence
Negative training data can be harmful to text classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Editorial: Classifying text streams by keywords using classifier ensemble

Data & Knowledge Engineering
Toward supervised anomaly detection

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the problem of learning from positive and unlabeled examples. Although several techniques exist for dealing with this problem, they all assume that positive examples in the positive set P and the positive examples in the unlabeled set U are generated from the same distribution. This assumption may be violated in practice. For example, one wants to collect all printer pages from the Web. One can use the printer pages from one site as the set P of positive pages and use product pages from another site as U. One wants to classify the pages in U into printer pages and non-printer pages. Although printer pages from the two sites have many similarities, they can also be quite different because different sites often present similar products in different styles and have different focuses. In such cases, existing methods perform poorly. This paper proposes a novel technique A-EM to deal with the problem. Experiment results with product page classification demonstrate the effectiveness of the proposed technique.