A grid-based architecture for nearest neighbor based condensation of huge datasets

  • Authors:
  • Fabrizio Angiulli;Gianluigi Folino

  • Affiliations:
  • University of Calabria, Rende, Italy;ICAR-CNR, Rende, Italy

  • Venue:
  • UPGRADE '08 Proceedings of the third international workshop on Use of P2P, grid and agents for the development of content networks
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Grid computing provides services to the users to discover, transfer, and manipulate large datasets distributed in different locations. Classifying large datasets without using a centralized approach is a key problem in this kind of architectures and, for instance, it is essential for the ever growing datasets bioinformatic scientists face. To this aim, Grid-FCNN, a grid-enabled architecture for classifying huge data set using the nearest neighbor rule is presented in this paper. In order to cope with the communication overhead typical of distributed environments and to reduce memory requirements, two different strategies are presented, namely Grid-FCNN1 and Grid-FCNN2, and their performances in grid environments is analyzed. An analysis of the experimental results, performed on both synthetic and real very large datasets, revealed that these techniques are adapt to be used in a Grid. Furthermore, it is illustrated how the Grid-based algorithm can be applicable in a real bioinformatics scenario.