Fast k most similar neighbor classifier for mixed data based on approximating and eliminating

Authors:
Selene Hernández-Rodríguez;J. Ariel Carrasco-Ochoa;J. Fco. Martínez-Trinidad
Affiliations:
Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, CP, México;Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, CP, México;Computer Science Department, National Institute of Astrophysics, Optics and Electronics, Puebla, CP, México
Venue:
PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2008

Citing 10
Cited 1

An algorithm for finding nearest neighbours in (approximately) constant average time

Pattern Recognition Letters
A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements

Pattern Recognition Letters
A fast branch & bound nearest neighbour classifier in metric spaces

Pattern Recognition Letters
Extension to C-means Algorithm for the Use of Similarity Functions

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Probabilistic proximity searching algorithms based on compact partitions

Journal of Discrete Algorithms - SPIRE 2002
Improvements of TLAESA nearest neighbour search algorithm and extension to approximation search

ACSC '06 Proceedings of the 29th Australasian Computer Science Conference - Volume 48
Fast and versatile algorithm for nearest neighbor search based on a lower bound tree

Pattern Recognition
Some approaches to improve tree-based nearest neighbour search algorithms

Pattern Recognition
A Branch and Bound Algorithm for Computing k-Nearest Neighbors

IEEE Transactions on Computers
On the least cost for proximity searching in metric spaces

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms

Parallel k-most similar neighbor classifier for mixed data

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

The k nearest neighbor (k-NN) classifier has been a widely used nonparametric technique in Pattern Recognition. In order to decide the class of a new prototype, the k-NN classifier performs an exhaustive comparison between the prototype to classify (query) and the prototypes in the training set T. However, when T is large, the exhaustive comparison is expensive. To avoid this problem, many fast k-NN algorithms have been developed. Some of these algorithms are based on Approximating-Eliminating search. In this case, the Approximating and Eliminating steps rely on the triangle inequality. However, in soft sciences, the prototypes are usually described by qualitative and quantitative features (mixed data), and sometimes the comparison function does not satisfy the triangle inequality. Therefore, in this work, a fast k most similar neighbour classifier for mixed data (AEMD) is presented. This classifier consists of two phases. In the first phase, a binary similarity matrix among the prototypes in T is stored. In the second phase, new Approximating and Eliminating steps, which are not based on the triangle inequality, are presented. The proposed classifier is compared against other fast k-NN algorithms, which are adapted to work with mixed data. Some experiments with real datasets are presented