Validation Measures for Clustering Algorithms Incorporating Biological Information

  • Authors:
  • Susmita Datta;Somnath Datta

  • Affiliations:
  • University of Louisville, USA;University of Louisville, USA

  • Venue:
  • IMSCCS '06 Proceedings of the First International Multi-Symposiums on Computer and Computational Sciences - Volume 1 (IMSCCS'06) - Volume 01
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

A cluster analysis is the most commonly performed procedure (often regarded as a first step) on a set of gene expression profiles. A closely related problem is that of selecting a clustering algorithm that is optimal in some way from a rather impressive list of clustering algorithms that currently exist. In this paper, we propose two validation measures each with two parts: one measuring the statistical consistency (stability) of the clusters produced and the other representing their biological functional consistency, so that a good clustering algorithm should have a small value for these measures. We illustrate our methods using two sets of expression profiles obtained from a breast cancer data set. Six well known clustering algorithms UPGMA, K-Means, Diana, Fanny, Model-Based and SOM were evaluated. Whereas the exact ordering depends on the particular data set (expression profiles) used and the validation measure employed, overall UPGMA appears to be the optimal for this cancer data set that we considered. R-codes: R-codes used in this paper are available from the author upon request.