Cluster validation: An integrative method for cluster analysis

  • Authors:
  • M. Visvanathan;B. S. Adagarla;H. L. Gerald;P. Smith

  • Affiliations:
  • Bioinf. Core Facility, Univ. of Kansas, Lawrence, KS, USA;Bioinf. Core Facility, Univ. of Kansas, Lawrence, KS, USA;Bioinf. Core Facility, Univ. of Kansas, Lawrence, KS, USA;Univ. of Kansas Med. Center, KS, USA

  • Venue:
  • BIBMW '09 Proceedings of the 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering is a widely used to discover underlying patterns and groups in data and there is a need to validate the quality of clusters generated by the numerous clustering algorithms in use. The need for cluster validitation arises from the fundamental definition of unsupervised learning. As clustering is an unsupervised learning process, the prediction of correct number of clusters is a hurdle which can be cleared by using cluster validity indices to assess the quality of the clusters. We have developed a tool for cluster validation as a part of GOAPhAR, a web based tool that integrates from disparate sources, information regarding gene annotations, protein annotations, identifiers associated with probe sets, functional pathways, protein interactions, gene Ontology and publicly available microarray datasets. Our cluster validity tool calculates three indices to indicate clustering quality viz. the Silhouette, Dunn's and Davies-Bouldin indices and outputs them to the user. The values of these indices can be used to judge the quality of clustering and to optimize the process of selecting an appropriate clustering algorithm and number of clusters.