A novel HMM-based clustering algorithm for the analysis of gene expression time-course data

Authors:
Yujing Zeng;Javier Garcia-Frias
Affiliations:
Department of Electrical and Computer Engineering, 140 Evans Hall, University of Delaware, Newark, DE 19716, USA;Department of Electrical and Computer Engineering, 140 Evans Hall, University of Delaware, Newark, DE 19716, USA
Venue:
Computational Statistics & Data Analysis
Year:
2006

Citing 7
Cited 4

Algorithms for clustering data

Algorithms for clustering data
A new approach to analyzing gene expression time series data

Proceedings of the sixth annual international conference on Computational biology
On Clustering Validation Techniques

Journal of Intelligent Information Systems
A Hidden Markov Model-Based Approach to Sequential Data Clustering

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Clustering of unevenly sampled gene expression time-series data

Fuzzy Sets and Systems
Similarity-based clustering of sequences using hidden Markov models

MLDM'03 Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence

Editorial: 2nd Special Issue on Statistical Signal Extraction and Filtering

Computational Statistics & Data Analysis
Discriminant factor anaysis for movement recognition: application to dance

Machine Graphics & Vision International Journal
Identifying cluster number for subspace projected functional data clustering

Computational Statistics & Data Analysis
Using semi-parametric clustering applied to electronic health record time series data

Proceedings of the 2011 workshop on Data mining for medicine and healthcare

Quantified Score

Hi-index	0.03

Visualization

Abstract

A novel hidden Markov model (HMM) and clustering algorithm for the analysis of gene expression time-course data is proposed. The proposed model, called the profile-HMM, is specifically designed to explicitly take into account the dynamic nature of temporal gene expression profiles, which is ignored by many clustering methods existing in the literature. In this model, gene expression dynamics are represented by a special set of paths, with each path characterizing a stochastic pattern. The profile-HMM is trained to contain the most likely set of stochastic patterns given the dynamic microarray data, and the clustering result is obtained by grouping together the time-series that are most likely to be related to the same pattern. The novelty of the method is that the behavior of the whole gene expression dataset is modeled by a single HMM acting as a self-organizing map, so that all the clusters are implicitly and jointly defined in the model during the training phase. An attractive property of the profile-HMM clustering algorithm is its ability to automatically identify the number of clusters. The resulting performance is demonstrated by its application on simulated and biological data.