Nearest neighbor editing aided by unlabeled data

Authors:
Donghai Guan;Weiwei Yuan;Young-Koo Lee;Sungyoung Lee
Affiliations:
Department of Computer Engineering, Kyung Hee University, 446 701 Yongin, Republic of Korea;Department of Computer Engineering, Kyung Hee University, 446 701 Yongin, Republic of Korea;Department of Computer Engineering, Kyung Hee University, 446 701 Yongin, Republic of Korea;Department of Computer Engineering, Kyung Hee University, 446 701 Yongin, Republic of Korea
Venue:
Information Sciences: an International Journal
Year:
2009

Citing 15
Cited 7

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
A connectionist model for selection of cases

Information Sciences: an International Journal
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Instance Selection and Construction for Data Mining

Instance Selection and Construction for Data Mining
Learning From Noisy Examples

Machine Learning
Exploiting unlabeled data in ensemble methods

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning subjective nouns using extraction pattern bootstrapping

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Rapid and brief communication: Active learning for image retrieval with Co-SVM

Pattern Recognition
An active feedback framework for image retrieval

Pattern Recognition Letters
Locality sensitive semi-supervised feature selection

Neurocomputing
A unified framework for semi-supervised dimensionality reduction

Pattern Recognition
A lazy bagging approach to classification

Pattern Recognition
An association-based case reduction technique for case-based reasoning

Information Sciences: an International Journal
Semi-supervised and active learning with the probabilistic RBF classifier

Neurocomputing
Nearest neighbor pattern classification

IEEE Transactions on Information Theory

Personalized mode transductive spanning SVM classification tree

Information Sciences: an International Journal
Combining instance selection methods based on data characterization: An approach to increase their effectiveness

Information Sciences: an International Journal
K Nearest Neighbor Equality: Giving equal chance to all existing classes

Information Sciences: an International Journal
An improved fast edit approach for two-string approximated mean computation applied to OCR

Pattern Recognition Letters
Semi-supervised multi-label image classification based on nearest neighbor editing

Neurocomputing
On the use of meta-learning for instance selection: An architecture and an experimental study

Information Sciences: an International Journal
On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification

Neurocomputing

Quantified Score

Hi-index	0.07

Visualization

Abstract

This paper proposes a novel method for nearest neighbor editing. Nearest neighbor editing aims to increase the classifier's generalization ability by removing noisy instances from the training set. Traditionally nearest neighbor editing edits (removes/retains) each instance by the voting of the instances in the training set (labeled instances). However, motivated by semi-supervised learning, we propose a novel editing methodology which edits each training instance by the voting of all the available instances (both labeled and unlabeled instances). We expect that the editing performance could be boosted by appropriately using unlabeled data. Our idea relies on the fact that in many applications, in addition to the training instances, many unlabeled instances are also available since they do not need human annotation effort. Three popular data editing methods, including edited nearest neighbor, repeated edited nearest neighbor and All k-NN are adopted to verify our idea. They are tested on a set of UCI data sets. Experimental results indicate that all the three editing methods can achieve improved performance with the aid of unlabeled data. Moreover, the improvement is more remarkable when the ratio of training data to unlabeled data is small.