Exploiting Unlabeled Data for Improving Accuracy of Predictive Data Mining

Authors:
Kang Peng;Slobodan Vucetic;Bo Han;Hongbo Xie;Zoran Obradovic
Affiliations:
-;-;-;-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 8
Cited 3

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
The nature of statistical learning theory

The nature of statistical learning theory
Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
Partially Supervised Classification of Text Documents

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Transforming classifier scores into accurate multiclass probability estimates

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating Unlabeled Images for Image Retrieval Based on Relevance Feedback

ICPR '00 Proceedings of the International Conference on Pattern Recognition - Volume 1
Uniform object generation for optimizing one-class classifiers

The Journal of Machine Learning Research

Blocking objectionable web content by leveraging multiple information sources

ACM SIGKDD Explorations Newsletter
Multimodal subjectivity analysis of multiparty conversation

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Agreement/disagreement classification: exploiting unlabeled data using contrast classifiers

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Predictive data mining typically relies on labeled datawithout exploiting a much larger amount of availableunlabeled data. The goal of this paper is to show thatusing unlabeled data can be beneficial in a range ofimportant prediction problems and therefore should be anintegral part of the learning process. Given an unlabeleddataset representative of the underlying distribution and aK-class labeled sample that might be biased, ourapproach is to learn K contrast classifiers each trained todiscriminate a certain class of labeled data from theunlabeled population. We illustrate that contrastclassifiers can be useful in one-class classification, outlierdetection, density estimation, and learning from biaseddata. The advantages of the proposed approach aredemonstrated by an extensive evaluation on synthetic datafollowed by real-life bioinformatics applications for (1)ranking PubMed articles by their relevance to proteindisorder and (2) cost-effective enlargement of adisordered protein database.