Communications of the ACM - Special issue on parallelism
Instance-Based Learning Algorithms
Machine Learning
Voting over Multiple Condensed Nearest Neighbors
Artificial Intelligence Review - Special issue on lazy learning
Multidimensional access methods
ACM Computing Surveys (CSUR)
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
Mining Very Large Databases with Parallel Processing
Mining Very Large Databases with Parallel Processing
Advances in Instance Selection for Instance-Based Learning Algorithms
Data Mining and Knowledge Discovery
Artificial Intelligence Review - Special issue on lazy learning
Combining Nearest Neighbor Classifiers Through Multiple Feature Subsets
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
MPICH-G2: a Grid-enabled implementation of the Message Passing Interface
Journal of Parallel and Distributed Computing - Special issue on computational grids
The Journal of Machine Learning Research
The Grid 2: Blueprint for a New Computing Infrastructure
The Grid 2: Blueprint for a New Computing Infrastructure
Core Vector Machines: Fast SVM Training on Very Large Data Sets
The Journal of Machine Learning Research
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Fast condensed nearest neighbor rule
ICML '05 Proceedings of the 22nd international conference on Machine learning
Cover trees for nearest neighbor
ICML '06 Proceedings of the 23rd international conference on Machine learning
A modular k-nearest neighbor classification method for massively parallel text categorization
CIS'04 Proceedings of the First international conference on Computational and Information Science
IEEE Transactions on Neural Networks
Fast minimization of structural risk by nearest neighbor rule
IEEE Transactions on Neural Networks
A grid-based architecture for nearest neighbor based condensation of huge datasets
UPGRADE '08 Proceedings of the third international workshop on Use of P2P, grid and agents for the development of content networks
Graph-Based Discrete Differential Geometry for Critical Instance Filtering
ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Linear reconstruction measure steered nearest neighbor classification framework
Pattern Recognition
Hi-index | 0.00 |
In this work, PFCNN, a distributed method for computing a consistent subset of very large data set for the nearest neighbor classification rule is presented. In order to cope with the communication overhead typical of distributed environments and to reduce memory requirements, different variants of the basic PFCNN method are introduced. An analysis of spatial cost, CPU cost, and communication overhead is accomplished for all the algorithms. Experimental results, performed on both synthetic and real very large data sets, revealed that these methods can be profitably applied to enormous collections of data. Indeed, they scale-up well and are efficient in memory consumption, confirming the theoretical analysis, and achieve noticeable data reduction and good classification accuracy. To the best of our knowledge, this is the first distributed algorithm for computing a training set consistent subset for the nearest neighbor rule.