New rank methods for reducing the size of the training set using the nearest neighbor rule

  • Authors:
  • Juan Ramón Rico-Juan;José Manuel Iñesta

  • Affiliations:
  • Dpto. Lenguajes y Sistemas Informáticos, Universidad de Alicante, E-03071 Alicante, Spain;Dpto. Lenguajes y Sistemas Informáticos, Universidad de Alicante, E-03071 Alicante, Spain

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2012

Quantified Score

Hi-index 0.10

Visualization

Abstract

Some new rank methods to select the best prototypes from a training set are proposed in this paper in order to establish its size according to an external parameter, while maintaining the classification accuracy. The traditional methods that filter the training set in a classification task like editing or condensing have some rules that apply to the set in order to remove outliers or keep some prototypes that help in the classification. In our approach, new voting methods are proposed to compute the prototype probability and help to classify correctly a new sample. This probability is the key to sorting the training set out, so a relevance factor from 0 to 1 is used to select the best candidates for each class whose accumulated probabilities are less than that parameter. This approach makes it possible to select the number of prototypes necessary to maintain or even increase the classification accuracy. The results obtained in different high dimensional databases show that these methods maintain the final error rate while reducing the size of the training set.