Fast k most similar neighbor classifier for mixed data based on approximating and eliminating

  • Authors:
  • Selene Hernández-Rodríguez;J. Ariel Carrasco-Ochoa;J. Fco. Martínez-Trinidad

  • Affiliations:
  • Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, CP, México;Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, CP, México;Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, CP, México

  • Venue:
  • PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify (query) and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. To avoid this problem, many fast k-NN algorithms have been developed. Some of these algorithms are based on Approximating-Eliminating search. In this case, the Approximating and Eliminating steps rely on the triangle inequality. However, in soft sciences, the prototypes are usually described by qualitative and quantitative features (mixed data), and sometimes the comparison function does not satisfy the triangle inequality. Therefore, in this work, a fast k most similar neighbour classifier for mixed data (AEMD) is presented. This classifier consists of two phases. In the first phase, a binary similarity matrix among the prototypes in T is stored. In the second phase, new Approximating and Eliminating steps, which are not based on the triangle inequality, are presented. The proposed classifier is compared against other fast k-NN algorithms, which are adapted to work with mixed data. Some experiments with real datasets are presented