Shared farthest neighbor approach to clustering of high dimensionality, low cardinality data

  • Authors:
  • Stefano Rovetta;Francesco Masulli

  • Affiliations:
  • Department of Computer and Information Sciences and CNISM, University of Genova, Via Dodecaneso 35 I-16146 Genova, Italy;Department of Computer and Information Sciences and CNISM, University of Genova, Via Dodecaneso 35 I-16146 Genova, Italy

  • Venue:
  • Pattern Recognition
  • Year:
  • 2006

Quantified Score

Hi-index 0.01

Visualization

Abstract

Clustering algorithms are routinely used in biomedical disciplines, and are a basic tool in bioinformatics. Depending on the task at hand, there are two most popular options, the central partitional techniques and the agglomerative hierarchical clustering techniques and their derivatives. These methods are well studied and well established. However, both categories have some drawbacks related to data dimensionality (for partitional algorithms) and to the bottom-up structure (for hierarchical agglomerative algorithms). To overcome these limitations, motivated by the problem of gene expression analysis with DNA microarrays, we present a hierarchical clustering algorithm based on a completely different principle, which is the analysis of shared farthest neighbors. We present a framework for clustering using ranks and indexes, and introduce the shared farthest neighbors (SFN) clustering criterion. We illustrate the properties of the method and present experimental results on different data sets, using the strategy of evaluating data clustering by extrinsic knowledge given by class labels.