Term-weighting approaches in automatic text retrieval
Information Processing and Management: an International Journal
The nature of statistical learning theory
The nature of statistical learning theory
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM
Machine Learning - Special issue on information retrieval
Partially Supervised Classification of Text Documents
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transforming classifier scores into accurate multiclass probability estimates
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating Unlabeled Images for Image Retrieval Based on Relevance Feedback
ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 1
Uniform object generation for optimizing one-class classifiers
The Journal of Machine Learning Research
Blocking objectionable web content by leveraging multiple information sources
ACM SIGKDD Explorations Newsletter
Multimodal subjectivity analysis of multiparty conversation
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Agreement/disagreement classification: exploiting unlabeled data using contrast classifiers
NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
Hi-index | 0.00 |
Predictive data mining typically relies on labeled datawithout exploiting a much larger amount of availableunlabeled data. The goal of this paper is to show thatusing unlabeled data can be beneficial in a range ofimportant prediction problems and therefore should be anintegral part of the learning process. Given an unlabeleddataset representative of the underlying distribution and aK-class labeled sample that might be biased, ourapproach is to learn K contrast classifiers each trained todiscriminate a certain class of labeled data from theunlabeled population. We illustrate that contrastclassifiers can be useful in one-class classification, outlierdetection, density estimation, and learning from biaseddata. The advantages of the proposed approach aredemonstrated by an extensive evaluation on synthetic datafollowed by real-life bioinformatics applications for (1)ranking PubMed articles by their relevance to proteindisorder and (2) cost-effective enlargement of adisordered protein database.