An instance selection algorithm based on reverse nearest neighbor

Authors:
Bi-Ru Dai;Shu-Ming Hsu
Affiliations:
The Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, ROC;The Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan, ROC
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Year:
2011

Citing 20
Cited 0

Instance-Based Learning Algorithms

Machine Learning
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Influence sets based on reverse nearest neighbor queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
Advances in Instance Selection for Instance-Based Learning Algorithms

Data Mining and Knowledge Discovery
Induction of Decision Trees

Machine Learning
Instance Pruning Techniques

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Transductive Inference for Text Classification using Support Vector Machines

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
On the Consistency of Information Filters for Lazy Learning Algorithms

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Optimizing search engines using clickthrough data

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 02
Fast Nearest Neighbor Condensation for Large Data Sets Classification

IEEE Transactions on Knowledge and Data Engineering
Hit Miss Networks with Applications to Instance Selection

The Journal of Machine Learning Research
The Good, the Bad and the Incorrectly Classified: Profiling Cases for Case-Base Editing

ICCBR '09 Proceedings of the 8th International Conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Remembering to forget: a competence-preserving case deletion policy for case-based reasoning systems

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study

IEEE Transactions on Evolutionary Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data reduction is to extract a subset from a dataset. The advantages of data reduction are decreasing the requirement of storage and increasing the efficiency of classification. Using the subset as training data is possible to maintain classification accuracy; sometimes, it can be further improved because of eliminating noises. The key is how to choose representative samples while ignoring noises at the same time. Many instance selection algorithms are based on nearest neighbor decision rule (NN). Some of these algorithms select samples based on two strategies, incremental and decremental. The first type of algorithms select some instances as samples and iteratively add instances which do not have the same class label with their nearest sample to the sample set. The second type of algorithms remove instances which do not have the same class label with their majority of kNN. However, we propose an algorithm based on Reverse Nearest Neighbor (RNN), called the Reverse Nearest Neighbor Reduction (RNNR). RNNR selects samples which can represent other instances in the same class. In addition, RNNR does not need to iteratively scan a dataset which takes much processing time. Experimental results show that RNNR achieves comparable accuracy and selects fewer samples than comparators.