Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data

  • Authors:
  • Desheng Huang;Wei Pan

  • Affiliations:
  • Department of Mathematics, China Medical University Shenyang, China;Division of Biostatistics, School of Public Health, University of Minnesota Minneapolis, MN 55455-0392, USA

  • Venue:
  • Bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Because co-expressed genes are likely to share the same biological function, cluster analysis of gene expression profiles has been applied for gene function discovery. Most existing clustering methods ignore known gene functions in the process of clustering. Results: To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions into a new distance metric, which shrinks a gene expression-based distance towards 0 if and only if the two genes share a common gene function. A two-step procedure is used. First, the shrinkage distance metric is used in any distance-based clustering method, e.g. K-medoids or hierarchical clustering, to cluster the genes with known functions. Second, while keeping the clustering results from the first step for the genes with known functions, the expression-based distance metric is used to cluster the remaining genes of unknown function, assigning each of them to either one of the clusters obtained in the first step or some new clusters. A simulation study and an application to gene function prediction for the yeast demonstrate the advantage of our proposal over the standard method. Contact: weip@biostat.umn.edu