Communications of the ACM
Equivalence of models for polynomial learnability
Information and Computation
C4.5: programs for machine learning
C4.5: programs for machine learning
Efficient noise-tolerant learning from statistical queries
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Machine learning and data mining
Communications of the ACM
Noise-tolerant learning, the parity problem, and the statistical query model
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Analyzing the effectiveness and applicability of co-training
Proceedings of the ninth international conference on Information and knowledge management
Machine Learning
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Enhancing Supervised Learning with Unlabeled Data
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries
COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
Building Text Classifiers Using Positive and Unlabeled Examples
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Learning classifiers from only positive and unlabeled data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
PE-PUC: A Graph Based PU-Learning Approach for Text Classification
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Learning to Find Relevant Biological Articles without Negative Training Examples
AI '08 Proceedings of the 21st Australasian Joint Conference on Artificial Intelligence: Advances in Artificial Intelligence
OcVFDT: one-class very fast decision tree for one-class classification of data streams
Proceedings of the Third International Workshop on Knowledge Discovery from Sensor Data
Semi-Supervised Novelty Detection
The Journal of Machine Learning Research
Bayesian classifiers for positive unlabeled learning
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Accurate measurements of pointing performance from in situ observations
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Learning very fast decision tree from uncertain data streams with positive and unlabeled samples
Information Sciences: an International Journal
Learning from data streams with only positive and unlabeled data
Journal of Intelligent Information Systems
Hi-index | 0.00 |
In many machine learning settings, labeled examples are difficult to collect while unlabeled data are abundant. Also, for some binary classification problems, positive examples which are elements of the target concept are available. Can these additional data be used to improve accuracy of supervised learning algorithms? We investigate in this paper the design of learning algorithms from positive and unlabeled data only. Many machine learning and data mining algorithms, such as decision tree induction algorithms and naive Bayes algorithms, use examples only to evaluate statistical queries (SQ-like algorithms). Kearns designed the statistical query learning model in order to describe these algorithms. Here, we design an algorithm scheme which transforms any SQ-like algorithm into an algorithm based on positive statistical queries (estimate for probabilities over the set of positive instances) and instance statistical queries (estimate for probabilities over the instance space). We prove that any class learnable in the statistical query learning model is learnable from positive statistical queries and instance statistical queries only if a lower bound on the weight of any, target concept f can be estimated in polynomial time. Then, we design a decision tree induction algorithm POSC4.5, based on C4.5, that uses only positive and unlabeled examples and we give experimental results for this algorithm. In the case of imbalanced classes in the sense that one of the two classes (say the positive class) is heavily underrepresented compared to the other class, the learning problem remains open. This problem is challenging because it is encountered in many real-world applications.