Parallel k-most similar neighbor classifier for mixed data

Authors:
Guillermo Sanchez-Diaz;Anilu Franco-Arcega;Carlos Aguirre-Salado;Ivan Piza-Davila;Luis R. Morales-Manilla;Uriel Escobar-Franco
Affiliations:
Universidad Autonoma de San Luis Potosi, San Luis Potosi, SLP, Mexico;Universidad Autonoma del Estado de Hidalgo, Pachuca, Hgo., Mexico;Universidad Autonoma de San Luis Potosi, San Luis Potosi, SLP, Mexico;Instituto Tecnologico y de Estudios Superiores de Occidente, Tlaquepaque, Jal., Mexico;Universidad Politecnica de Tulancingo, Tulancingo, Hgo., Mexico;Universidad Politecnica de Tulancingo, Tulancingo, Hgo., Mexico
Venue:
IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Year:
2012

Citing 12
Cited 0

The k-Nearest Neighbour Join: Turbo Charging the KDD Process

Knowledge and Information Systems
Fast and versatile algorithm for nearest neighbor search based on a lower bound tree

Pattern Recognition
Efficient index-based KNN join processing for high-dimensional data

Information and Software Technology
Gorder: an efficient method for KNN join processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Fast k most similar neighbor classifier for mixed data (tree k-MSN)

Pattern Recognition
High-dimensional kNN joins with incremental updates

Geoinformatica
Parallel wavelet transform for spatio-temporal outlier detection in large meteorological data

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Fast k most similar neighbor classifier for mixed data based on approximating and eliminating

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Modeling of network computing systems for decision tree induction tasks

IDEAL'09 Proceedings of the 10th international conference on Intelligent data engineering and automated learning
Search space reductions for nearest-neighbor queries

TAMC'08 Proceedings of the 5th international conference on Theory and applications of models of computation
P2LSA and P2LSA+: two paralleled probabilistic latent semantic analysis algorithms based on the mapreduce model

IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Nearest neighbor pattern classification

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a paralellization of the incremental algorithm inc-k-msn, for mixed data and similarity functions that do not satisfy metric properties. The algorithm presented is suitable for processing large data sets, because it only stores in main memory the k-most similar neighbors processed in step t, traversing only once the training data set. Several experiments with synthetic and real data are presented.