Learning classifiers from only positive and unlabeled data

Authors:
Charles Elkan;Keith Noto
Affiliations:
University of California, San Diego, La Jolla, CA, USA;University of California, San Diego, La Jolla, CA, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 16
Cited 39

Statistical analysis with missing data

Statistical analysis with missing data
Transforming classifier scores into accurate multiclass probability estimates

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
One-class svms for document classification

The Journal of Machine Learning Research
Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
PEBL: Web Page Classification without Negative Examples

IEEE Transactions on Knowledge and Data Engineering
Support Vector Data Description

Machine Learning
A Bayesian network framework for reject inference

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Text Classification without Negative Examples Revisit

IEEE Transactions on Knowledge and Data Engineering
Single-Class Classification with Mapping Convergence

Machine Learning
Estimating the Support of a High-Dimensional Distribution

Neural Computation
Learning from positive and unlabeled examples

Theoretical Computer Science - Algorithmic learning theory (ALT 2000)
PSoL: a positive sample only learning algorithm for finding non-coding RNA genes

Bioinformatics
Making generative classifiers robust to selection bias

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A note on Platt's probabilistic outputs for support vector machines

Machine Learning
Autonomously semantifying wikipedia

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Partially supervised classification – based on weighted unlabeled samples support vector machine

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications

Learning to Find Relevant Biological Articles without Negative Training Examples

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
Cool Blog Classification from Positive and Unlabeled Examples

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Audience selection for on-line brand advertising: privacy-friendly social network targeting

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
OcVFDT: one-class very fast decision tree for one-class classification of data streams

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Active learning in partially supervised classification

Proceedings of the 18th ACM conference on Information and knowledge management
A large-scale active learning system for topical categorization on the web

Proceedings of the 19th international conference on World wide web
Intelligent selection of language model training data

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Distributional similarity vs. PU learning for entity set expansion

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Negative training data can be harmful to text classification

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Semi-supervised learning from only positive and unlabeled data using entropy

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Semi-Supervised Novelty Detection

The Journal of Machine Learning Research
Beyond keyword search: discovering relevant scientific literature

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Labeling negative examples in supervised learning of new gene regulatory connections

CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
Bayesian classifiers for positive unlabeled learning

WAIM'11 Proceedings of the 12th international conference on Web-age information management
A pairwise ranking based approach to learning with positive and unlabeled examples

Proceedings of the 20th ACM international conference on Information and knowledge management
A bootstrapping algorithm to improve cohort identification using structured data

Journal of Biomedical Informatics
Learning from positive and unlabeled documents for retrieval of bacterial protein-protein interaction literature

ISMB/ECCB'09 Proceedings of the 2009 workshop of the BioLink Special Interest Group, international conference on Linking Literature, Information, and Knowledge for Biology
A software framework for classification models of geographical data

Computers & Geosciences
Learning from positive and unlabeled amazon reviews: towards identifying trustworthy reviewers

Proceedings of the 21st international conference companion on World Wide Web
Accurate measurements of pointing performance from in situ observations

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Estimate unlabeled-data-distribution for semi-supervised PU learning

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Ensemble based positive unlabeled learning for time series classification

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Automatic state abstraction from demonstration

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Learning very fast decision tree from uncertain data streams with positive and unlabeled samples

Information Sciences: an International Journal
DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets

Data & Knowledge Engineering
Crosslingual distant supervision for extracting relations of different complexity

Proceedings of the 21st ACM international conference on Information and knowledge management
Multiple-instance learning as a classifier combining problem

Pattern Recognition
Learning from positive and unlabelled examples using maximum margin clustering

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Mining large streams of user data for personalized recommendations

ACM SIGKDD Explorations Newsletter
Towards never-ending learning from time series streams

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Heat pump detection from coarse grained smart meter data with positive and unlabeled learning

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning from data streams with only positive and unlabeled data

Journal of Intelligent Information Systems
Cross social networks interests predictions based ongraph features

Proceedings of the 7th ACM conference on Recommender systems
Timeline adaptation for text classification

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Supervised hypothesis discovery using syllogistic patterns in the biomedical literature

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Accelerated robust point cloud registration in natural environments through positive and unlabeled learning

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Search by multiple examples

Proceedings of the 7th ACM international conference on Web search and data mining
Differential privacy based on importance weighting

Machine Learning
A bagging SVM to learn from positive and unlabeled examples

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

The input to an algorithm that learns a binary classifier normally consists of two sets of examples, where one set consists of positive examples of the concept to be learned, and the other set consists of negative examples. However, it is often the case that the available training data are an incomplete set of positive examples, and a set of unlabeled examples, some of which are positive and some of which are negative. The problem solved in this paper is how to learn a standard binary classifier given a nontraditional training set of this nature. Under the assumption that the labeled examples are selected randomly from the positive examples, we show that a classifier trained on positive and unlabeled examples predicts probabilities that differ by only a constant factor from the true conditional probabilities of being positive. We show how to use this result in two different ways to learn a classifier from a nontraditional training set. We then apply these two new methods to solve a real-world problem: identifying protein records that should be included in an incomplete specialized molecular biology database. Our experiments in this domain show that models trained using the new methods perform better than the current state-of-the-art biased SVM method for learning from positive and unlabeled examples.