A classification of cluster validity indexes based on membership degree and applications
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part I
Hi-index | 0.00 |
Clustering is a widely used to discover underlying patterns and groups in data and there is a need to validate the quality of clusters generated by the numerous clustering algorithms in use. The need for cluster validitation arises from the fundamental definition of unsupervised learning. As clustering is an unsupervised learning process, the prediction of correct number of clusters is a hurdle which can be cleared by using cluster validity indices to assess the quality of the clusters. We have developed a tool for cluster validation as a part of GOAPhAR, a web based tool that integrates from disparate sources, information regarding gene annotations, protein annotations, identifiers associated with probe sets, functional pathways, protein interactions, gene Ontology and publicly available microarray datasets. Our cluster validity tool calculates three indices to indicate clustering quality viz. the Silhouette, Dunn's and Davies-Bouldin indices and outputs them to the user. The values of these indices can be used to judge the quality of clustering and to optimize the process of selecting an appropriate clustering algorithm and number of clusters.