Efficient distributed data condensation for nearest neighbor classification

Authors:
Fabrizio Angiulli;Gianluigi Folino
Affiliations:
DEIS, Università della Calabria, Rende, Italy;Institute of High Performance Computing and Networking, Rende, Italy
Venue:
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Year:
2007

Citing 6
Cited 0

The Grid 2: Blueprint for a New Computing Infrastructure

The Grid 2: Blueprint for a New Computing Infrastructure
Fast condensed nearest neighbor rule

ICML '05 Proceedings of the 22nd international conference on Machine learning
On the Inequality of Cover and Hart in Nearest Neighbor Discrimination

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
The condensed nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory
Fast minimization of structural risk by nearest neighbor rule

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this work, PFCNN, a distributed method for computing a consistent subset of very large data sets for the nearest neighbor decision rule is presented. In order to cope with the communication overhead typical of distributed environments and to reduce memory requirements, different variants of the basic PFCNN method are introduced. Experimental results, performed on a class of synthetic datasets revealed that these methods can be profitably applied to enormous collections of data. Indeed, they scale-up well and are efficient in memory consumption and achieve noticeable data reduction and good classification accuracy. To the best of our knowledge, this is the first distributed algorithm for computing a training set consistent subset for the nearest neighbor rule.