Learning from positive and unlabeled examples

Authors:
François Denis;Rémi Gilleron;Fabien Letouzey
Affiliations:
Équipe BDAA, LIF, Université de Provence, Marseille, France;Équipe Grappa, LIFL, Université de Lille and Université Charles de Gaulle, Lille, France;Équipe Grappa, LIFL, Université de Lille and Université Charles de Gaulle, Lille, France
Venue:
Theoretical Computer Science - Algorithmic learning theory (ALT 2000)
Year:
2005

Citing 14
Cited 10

A theory of the learnable

Communications of the ACM
Equivalence of models for polynomial learnability

Information and Computation
C4.5: programs for machine learning

C4.5: programs for machine learning
Efficient noise-tolerant learning from statistical queries

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Machine learning and data mining

Communications of the ACM
Noise-tolerant learning, the parity problem, and the statistical query model

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training

Proceedings of the ninth international conference on Information and knowledge management
Learning From Noisy Examples

Machine Learning
PAC Learning with Constant-Partition Classification Noise and Applications to Decision Tree Induction

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory

Building Text Classifiers Using Positive and Unlabeled Examples

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Learning classifiers from only positive and unlabeled data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
PE-PUC: A Graph Based PU-Learning Approach for Text Classification

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Learning to Find Relevant Biological Articles without Negative Training Examples

AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
OcVFDT: one-class very fast decision tree for one-class classification of data streams

Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Semi-Supervised Novelty Detection

The Journal of Machine Learning Research
Bayesian classifiers for positive unlabeled learning

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Accurate measurements of pointing performance from in situ observations

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Learning very fast decision tree from uncertain data streams with positive and unlabeled samples

Information Sciences: an International Journal
Learning from data streams with only positive and unlabeled data

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many machine learning settings, labeled examples are difficult to collect while unlabeled data are abundant. Also, for some binary classification problems, positive examples which are elements of the target concept are available. Can these additional data be used to improve accuracy of supervised learning algorithms? We investigate in this paper the design of learning algorithms from positive and unlabeled data only. Many machine learning and data mining algorithms, such as decision tree induction algorithms and naive Bayes algorithms, use examples only to evaluate statistical queries (SQ-like algorithms). Kearns designed the statistical query learning model in order to describe these algorithms. Here, we design an algorithm scheme which transforms any SQ-like algorithm into an algorithm based on positive statistical queries (estimate for probabilities over the set of positive instances) and instance statistical queries (estimate for probabilities over the instance space). We prove that any class learnable in the statistical query learning model is learnable from positive statistical queries and instance statistical queries only if a lower bound on the weight of any, target concept f can be estimated in polynomial time. Then, we design a decision tree induction algorithm POSC4.5, based on C4.5, that uses only positive and unlabeled examples and we give experimental results for this algorithm. In the case of imbalanced classes in the sense that one of the two classes (say the positive class) is heavily underrepresented compared to the other class, the learning problem remains open. This problem is challenging because it is encountered in many real-world applications.