Using weighted nearest neighbor to benefit from unlabeled data

Authors:
Kurt Driessens;Peter Reutemann;Bernhard Pfahringer;Claire Leschi
Affiliations:
Department of Computer Science, K.U. Leuven, Belgium;Department of Computer Science, University of Waikato, Hamilton, New Zealand;Department of Computer Science, University of Waikato, Hamilton, New Zealand;Institut National des Sciences Appliquees, Lyon, France
Venue:
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2006

Citing 8
Cited 7

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Text Classification from Labeled and Unlabeled Documents using EM

Machine Learning - Special issue on information retrieval
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-Supervised Self-Training of Object Detection Models

WACV-MOTION '05 Proceedings of the Seventh IEEE Workshops on Application of Computer Vision (WACV/MOTION'05) - Volume 1 - Volume 01
Semi-supervised learning with graphs

Semi-supervised learning with graphs
NeC4.5: Neural Ensemble Based C4.5

IEEE Transactions on Knowledge and Data Engineering

A Fault Prediction Model with Limited Fault Data to Improve Test Process

PROFES '08 Proceedings of the 9th international conference on Product-Focused Software Process Improvement
Improving k-Nearest Neighbour Classification with Distance Functions Based on Receiver Operating Characteristics

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Kernel-Based Transductive Learning with Nearest Neighbors

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Scaling up semi-supervised learning: an efficient and effective LLGC variant

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Learning Instance Weighted Naive Bayes from labeled and unlabeled data

Journal of Intelligent Information Systems
Decision Forests: A Unified Framework for Classification, Regression, Density Estimation, Manifold Learning and Semi-Supervised Learning

Foundations and Trends® in Computer Graphics and Vision
A document is known by the company it keeps: neighborhood consensus for short text categorization

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The development of data-mining applications such as textclassification and molecular profiling has shown the need for machine learning algorithms that can benefit from both labeled and unlabeled data, where often the unlabeled examples greatly outnumber the labeled examples. In this paper we present a two-stage classifier that improves its predictive accuracy by making use of the available unlabeled data. It uses a weighted nearest neighbor classification algorithm using the combined example-sets as a knowledge base. The examples from the unlabeled set are “pre-labeled” by an initial classifier that is build using the limited available training data. By choosing appropriate weights for this pre-labeled data, the nearest neighbor classifier consistently improves on the original classifier.