A novel HMM-based clustering algorithm for the analysis of gene expression time-course data

  • Authors:
  • Yujing Zeng;Javier Garcia-Frias

  • Affiliations:
  • Department of Electrical and Computer Engineering, 140 Evans Hall, University of Delaware, Newark, DE 19716, USA;Department of Electrical and Computer Engineering, 140 Evans Hall, University of Delaware, Newark, DE 19716, USA

  • Venue:
  • Computational Statistics & Data Analysis
  • Year:
  • 2006

Quantified Score

Hi-index 0.03

Visualization

Abstract

A novel hidden Markov model (HMM) and clustering algorithm for the analysis of gene expression time-course data is proposed. The proposed model, called the profile-HMM, is specifically designed to explicitly take into account the dynamic nature of temporal gene expression profiles, which is ignored by many clustering methods existing in the literature. In this model, gene expression dynamics are represented by a special set of paths, with each path characterizing a stochastic pattern. The profile-HMM is trained to contain the most likely set of stochastic patterns given the dynamic microarray data, and the clustering result is obtained by grouping together the time-series that are most likely to be related to the same pattern. The novelty of the method is that the behavior of the whole gene expression dataset is modeled by a single HMM acting as a self-organizing map, so that all the clusters are implicitly and jointly defined in the model during the training phase. An attractive property of the profile-HMM clustering algorithm is its ability to automatically identify the number of clusters. The resulting performance is demonstrated by its application on simulated and biological data.