Novel Hybrid Hierarchical-K-means Clustering Method (H-K-means) for Microarray Analysis

  • Authors:
  • Bernard Chen;R. Harrison;Yi Pan;Phang C. Tai

  • Affiliations:
  • Department of Computer Science, Georgia State University, Atlanta;Department of Computer Science, Georgia State University, Atlanta;Department of Computer Science, Georgia State University, Atlanta;Department of Biology, Georgia State University, Atlanta

  • Venue:
  • CSBW '05 Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference - Workshops
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

Hierarchical and k-means clustering are two major analytical tools for unsupervised microarray datasets. However, both have their innate disadvantages. Hierarchical clustering cannot represent distinct clusters with similar expression patterns. Also, as clusters grow in size, the actual expression patterns become less relevant. K-means clustering requires a specified number of clusters in advance and chooses initial centroids randomly; in addition, it is sensitive to outliers. We present a novel hybrid approach to combined merits of the two and discard disadvantages we mentioned above. It is different from existed method: carry out hierarchical clustering first to decide location and number of clusters in the first round and run the K-means clustering in another round. The brief idea is we cluster around half data through hierarchical clustering and succeed by K-means for the rest half in one single round. Also, our approach provides a mechanism to handle outliers. Comparing with existed hybrid clustering approach and K-means clustering in 2 different distance measure on Eisen驴s yeast microarray data, our method always generate much higher quality clusters.