Clustering of unevenly sampled gene expression time-series data

  • Authors:
  • C. S. Möller-Levet;F. Klawonn;K. -H. Cho;H. Yin;O. Wolkenhauer

  • Affiliations:
  • Department of Electrical Engineering and Electronics, University of Manchester, Institute of Science and Technology, Manchester M60 1QD, UK;Department of Computer Science, University of Applied Sciences, D-38302 Wolfenbüttel, Germany;College of Medicine, Seoul National University, Chongno-gu, Seoul, 110-799, Republic of Korea and Korea Bio-MAX Center, Seoul National University, Gwanak-gu, Seoul, 151-818, Republic of Korea;Department of Electrical Engineering and Electronics, University of Manchester, Institute of Science and Technology, Manchester M60 1QD, UK;Department of Computer Science, Systems Biology & Bioinformatics Group, University of Rostock, Albert-Einstein Str. 21, 18059 Rostock, Germany

  • Venue:
  • Fuzzy Sets and Systems
  • Year:
  • 2005

Quantified Score

Hi-index 0.20

Visualization

Abstract

Time course measurements are becoming a common type of experiment in the use of microarrays. The temporal order of the data and the varying length of sampling intervals are important and should be considered in clustering time-series. However, the shortness of gene expression time-series data limits the use of conventional statistical models and techniques for time-series analysis. To address this problem, this paper proposes the fuzzy short time-series (FSTS) clustering algorithm, which clusters profiles based on the similarity of their relative change of expression level and the corresponding temporal information. One of the major advantages of fuzzy clustering is that genes can belong to more than one group, revealing distinctive features of each gene's function and regulation. Several examples are provided to illustrate the performance of the proposed algorithm. In addition, we present the validation of the algorithm by clustering the genes which define the model profiles in Chu et al. (Science, 282 (1998) 699). The fuzzy c-means, k-means, average linkage hierarchical algorithm and random clustering are compared to the proposed FSTS algorithm. The performance is evaluated with a well-established cluster validity measure proving that the FSTS algorithm has a better performance than the compared algorithms in clustering similar rates of change of expression in successive unevenly distributed time points. Moreover, the FSTS algorithm was able to cluster in a biologically meaningful way the genes defining the model profiles.