An optimal hierarchical clustering algorithm for gene expression data

  • Authors:
  • Sudip Seal;Srikanth Komarina;Srinivas Aluru

  • Affiliations:
  • Department of Electrical and Computer Engineering, Iowa State University, Ames, IA;Department of Electrical and Computer Engineering, Iowa State University, Ames, IA;Department of Electrical and Computer Engineering, Iowa State University, Ames, IA

  • Venue:
  • Information Processing Letters
  • Year:
  • 2005

Quantified Score

Hi-index 0.89

Visualization

Abstract

Microarrays are used for measuring expression levels of thousands of genes simultaneously. Clustering algorithms are used on gene expression data to find co-regulated genes. An often used clustering strategy is the Pearson correlation coefficient based hierarchical clustering algorithm presented in [Proc. Nat. Acad. Sci. 95 (25) (1998) 14863-14868], which takes O(N3) time. We note that this run time can be reduced to O(N2) by applying known hierarchical clustering algorithms [Proc. 9th Annual ACM-SIAM Symposium on Discrete Algorithms, 1998, pp. 619-628] to this problem. In this paper, we present an algorithm which runs in O(N log N) time using a geometrical reduction and show that it is optimal.