Performance Evaluation of Some Clustering Algorithms and Validity Indices
IEEE Transactions on Pattern Analysis and Machine Intelligence
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Cluster Analysis for Gene Expression Data: A Survey
IEEE Transactions on Knowledge and Data Engineering
Analyzing time series gene expression data
Bioinformatics
Clustering of time-course gene expression data using functional data analysis
Computational Biology and Chemistry
A survey of kernel and spectral methods for clustering
Pattern Recognition
An energy-based similarity measure for time series
EURASIP Journal on Advances in Signal Processing
Autoregressive-model-based missing value estimation for DNA microarray time series data
IEEE Transactions on Information Technology in Biomedicine
Clustering of unevenly sampled gene expression time-series data
Fuzzy Sets and Systems
Clustering of time series data-a survey
Pattern Recognition
Some useful properties of Teager's energy operators
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: digital speech processing - Volume III
Some new indexes of cluster validity
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Hybrid method for the analysis of time series gene expression data
Knowledge-Based Systems
Hi-index | 0.01 |
Clustering analysis of temporal gene expression data is widely used to study dynamic biological systems, such as identifying sets of genes that are regulated by the same mechanism. However, most temporal gene expression data often contain noise, missing data points, and non-uniformly sampled time points, which imposes challenges for traditional clustering methods of extracting meaningful information. In this paper, we introduce an improved clustering approach based on the regularized spline regression and an energy based similarity measure. The proposed approach models each gene expression profile as a B-spline expansion, for which the spline coefficients are estimated by regularized least squares scheme on the observed data. To compensate the inadequate information from noisy and short gene expression data, we use its correlated genes as the test set to choose the optimal number of basis and the regularization parameter. We show that this treatment can help to avoid over-fitting. After fitting the continuous representations of gene expression profiles, we use an energy based similarity measure for clustering. The energy based measure can include the temporal information and relative changes of the time series using the first and second derivatives of the time series. We demonstrate that our method is robust to noise and can produce meaningful clustering results.