Clustering of temporal gene expression data by regularized spline regression and an energy based similarity measure

Authors:
Wei-Feng Zhang;Chao-Chun Liu;Hong Yan
Affiliations:
Department of Applied Mathematics, South China Agricultural University, 483 Wushan Road, Guangzhou 510642, China;Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong;Department of Electronic Engineering, City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong
Venue:
Pattern Recognition
Year:
2010

Citing 13
Cited 3

Performance Evaluation of Some Clustering Algorithms and Validity Indices

IEEE Transactions on Pattern Analysis and Machine Intelligence
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Analyzing time series gene expression data

Bioinformatics
Clustering of time-course gene expression data using functional data analysis

Computational Biology and Chemistry
A survey of kernel and spectral methods for clustering

Pattern Recognition
An energy-based similarity measure for time series

EURASIP Journal on Advances in Signal Processing
Towards improving fuzzy clustering using support vector machine: Application to gene expression data

Pattern Recognition
Autoregressive-model-based missing value estimation for DNA microarray time series data

IEEE Transactions on Information Technology in Biomedicine
Clustering of unevenly sampled gene expression time-series data

Fuzzy Sets and Systems
Clustering of time series data-a survey

Pattern Recognition
Some useful properties of Teager's energy operators

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: digital speech processing - Volume III
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Exon prediction using empirical mode decomposition and Fourier transform of structural profiles of DNA sequences

Pattern Recognition
Hybrid method for the analysis of time series gene expression data

Knowledge-Based Systems
Biclustering of gene expression data based on related genes and conditions extraction

Pattern Recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Clustering analysis of temporal gene expression data is widely used to study dynamic biological systems, such as identifying sets of genes that are regulated by the same mechanism. However, most temporal gene expression data often contain noise, missing data points, and non-uniformly sampled time points, which imposes challenges for traditional clustering methods of extracting meaningful information. In this paper, we introduce an improved clustering approach based on the regularized spline regression and an energy based similarity measure. The proposed approach models each gene expression profile as a B-spline expansion, for which the spline coefficients are estimated by regularized least squares scheme on the observed data. To compensate the inadequate information from noisy and short gene expression data, we use its correlated genes as the test set to choose the optimal number of basis and the regularization parameter. We show that this treatment can help to avoid over-fitting. After fitting the continuous representations of gene expression profiles, we use an energy based similarity measure for clustering. The energy based measure can include the temporal information and relative changes of the time series using the first and second derivatives of the time series. We demonstrate that our method is robust to noise and can produce meaningful clustering results.