A Novel Approach for Automatic Number of Clusters Detection in Microarray Data Based on Consensus Clustering

  • Authors:
  • Nguyen Xuan Vinh;Julien Epps

  • Affiliations:
  • -;-

  • Venue:
  • BIBE '09 Proceedings of the 2009 Ninth IEEE International Conference on Bioinformatics and Bioengineering
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

Estimating the true number of clusters in a data set is one of the major challenges in cluster analysis. Yet in certain domains,knowing the true number of clusters is of high importance. For example, in medical research, detecting the true number of groups and sub-groups of cancer would be of utmost importance for their effective treatment. In this paper we propose a novel method to estimate the number of clusters in a micro array data set based on the consensus clustering approach. Although the main objective of consensus clustering is to discover a robust and high quality cluster structure in a data set, closer inspection of the set of clusterings obtained can often give valuable information about the appropriate number of clusters present. More specifically, the set off clusterings obtained when the specified number of clusters coincides with the true number of clusters tends to be less diverse.To quantify this diversitywe develop a novel index, namely the Consensus Index (CI), which is built upon a suitable clustering similarity measure such as the well known Adjusted Rand Index (ARI)or our recently developed, information theoretic based index, namely the Adjusted Mutual Information (AMI). Our experiments on both synthetic and real microarray data sets indicate that the CI is a useful indicator for determining the appropriate number of clusters.