New rank methods for reducing the size of the training set using the nearest neighbor rule

Authors:
Juan Ramón Rico-Juan;José Manuel Iñesta
Affiliations:
Dpto. Lenguajes y Sistemas Informáticos, Universidad de Alicante, E-03071 Alicante, Spain;Dpto. Lenguajes y Sistemas Informáticos, Universidad de Alicante, E-03071 Alicante, Spain
Venue:
Pattern Recognition Letters
Year:
2012

Citing 9
Cited 1

The String-to-String Correction Problem

Journal of the ACM (JACM)
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
Finding Prototypes For Nearest Neighbor Classifiers

IEEE Transactions on Computers
Improved heterogeneous distance functions

Journal of Artificial Intelligence Research
Considerations about sample-size sensitivity of a family of editednearest-neighbor rules

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
The condensed nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory
The reduced nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory
An algorithm for a selective nearest neighbor decision rule (Corresp.)

IEEE Transactions on Information Theory

A new iterative algorithm for computing a quality approximate median of strings based on edit operations

Pattern Recognition Letters

Quantified Score

Hi-index	0.10

Visualization

Abstract

Some new rank methods to select the best prototypes from a training set are proposed in this paper in order to establish its size according to an external parameter, while maintaining the classification accuracy. The traditional methods that filter the training set in a classification task like editing or condensing have some rules that apply to the set in order to remove outliers or keep some prototypes that help in the classification. In our approach, new voting methods are proposed to compute the prototype probability and help to classify correctly a new sample. This probability is the key to sorting the training set out, so a relevance factor from 0 to 1 is used to select the best candidates for each class whose accumulated probabilities are less than that parameter. This approach makes it possible to select the number of prototypes necessary to maintain or even increase the classification accuracy. The results obtained in different high dimensional databases show that these methods maintain the final error rate while reducing the size of the training set.