A modified correlation coefficient based similarity measure for clustering time-course gene expression data

  • Authors:
  • Young Sook Son;Jangsun Baek

  • Affiliations:
  • Department of Statistics, Chonnam National University, Gwangju 500-757, Republic of Korea;Department of Statistics, Chonnam National University, Gwangju 500-757, Republic of Korea

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2008

Quantified Score

Hi-index 0.10

Visualization

Abstract

Gene expression levels are often measured consecutively in time through microarray experiments to detect cellular processes underlying regulatory effects observed and to assign functionality to genes whose function is yet unknown. Clustering methods allow us to group genes that show similar time-course expression profiles and that are thus likely to be co-regulated. The correlation coefficient, the most well-liked similarity measure in the context of gene expression data, is not very reliable in representing the association of two temporal profile patterns. Moreover, the clustering methods with the correlation coefficient generate the same clustering result even when the time points are permuted arbitrarily. We propose a new similarity measure for clustering time-course gene expression data. The proposed measure is based on the correlation coefficient and the two indices representing the concordance of temporal profile patterns and that of the time points at which maximum and minimum expression levels are measured between two profiles, respectively. We applied the hierarchical clustering method with the proposed similarity measure to both synthetic and breast cancer cell line data. We observed favorable results compared to the correlation coefficient based method. The proposed similarity measure is simple to implement, and it is much more consistent for clustering than the correlation coefficient based method according to the cross-validation criterion.