Communications of the ACM
C4.5: programs for machine learning
C4.5: programs for machine learning
Efficient noise-tolerant learning from statistical queries
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Learning to classify text from labeled and unlabeled documents
AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Machine Learning
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
PAC Learning from Positive Statistical Queries
ALT '98 Proceedings of the 9th International Conference on Algorithmic Learning Theory
Learning from Positive and Unlabeled Examples
ALT '00 Proceedings of the 11th International Conference on Algorithmic Learning Theory
Towards the web of concepts: extracting concepts from large datasets
Proceedings of the VLDB Endowment
A survey of recent trends in one class classification
AICS'09 Proceedings of the 20th Irish conference on Artificial intelligence and cognitive science
A semi-naive bayesian learning method for utilizing unlabeled data
KES'06 Proceedings of the 10th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part I
A new PU learning algorithm for text classification
MICAI'05 Proceedings of the 4th Mexican international conference on Advances in Artificial Intelligence
Named entity disambiguation in streaming data
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
A parallel genetic programming for single class classification
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Hi-index | 0.00 |
In many learning problems, labeled examples are rare or expensive while numerous unlabeled and positive examples are available. However, most learning algorithms only use labeled examples. Thus we address the problem of learning with the help of positive and unlabeled data given a small number of labeled examples. We present both theoretical and empirical arguments showing that learning algorithms can be improved by the use of both unlabeled and positive data. As an illustrating problem, we consider the learning algorithm from statistics for monotone conjunctions in the presence of classiffication noise and give empirical evidence of our assumptions. We give theoretical results for the improvement of Statistical Query learning algorithms from positive and unlabeled data. Lastly, we apply these ideas to tree induction algorithms. We modify the code of C4.5 to get an algorithm which takes as input a set LAB of labeled examples, a set POS of positive examples and a set UNL of unlabeled data and which uses these three sets to construct the decision tree. We provide experimental results based on data taken from UCI repository which confirm the relevance of this approach.