A grid-based architecture for nearest neighbor based condensation of huge datasets

Authors:
Fabrizio Angiulli;Gianluigi Folino
Affiliations:
University of Calabria, Rende, Italy;ICAR-CNR, Rende, Italy
Venue:
UPGRADE '08 Proceedings of the third international workshop on Use of P2P, grid and agents for the development of content networks
Year:
2008

Citing 8
Cited 0

The network weather service: a distributed resource performance forecasting service for metacomputing

Future Generation Computer Systems - Special issue on metacomputing
Dynamically forecasting network performance using the Network Weather Service

Cluster Computing
MPICH-G2: a Grid-enabled implementation of the Message Passing Interface

Journal of Parallel and Distributed Computing - Special issue on computational grids
The Grid 2: Blueprint for a New Computing Infrastructure

The Grid 2: Blueprint for a New Computing Infrastructure
Fast condensed nearest neighbor rule

ICML '05 Proceedings of the 22nd international conference on Machine learning
Distributed Nearest Neighbor-Based Condensation of Very Large Data Sets

IEEE Transactions on Knowledge and Data Engineering
Protein data condensation for effective quaternary structure classification

IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Fast minimization of structural risk by nearest neighbor rule

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Grid computing provides services to the users to discover, transfer, and manipulate large datasets distributed in different locations. Classifying large datasets without using a centralized approach is a key problem in this kind of architectures and, for instance, it is essential for the ever growing datasets bioinformatic scientists face. To this aim, Grid-FCNN, a grid-enabled architecture for classifying huge data set using the nearest neighbor rule is presented in this paper. In order to cope with the communication overhead typical of distributed environments and to reduce memory requirements, two different strategies are presented, namely Grid-FCNN1 and Grid-FCNN2, and their performances in grid environments is analyzed. An analysis of the experimental results, performed on both synthetic and real very large datasets, revealed that these techniques are adapt to be used in a Grid. Furthermore, it is illustrated how the Grid-based algorithm can be applicable in a real bioinformatics scenario.