Mining time-shifting co-regulation patterns from gene expression data

Authors:
Ying Yin;Yuhai Zhao;Bin Zhang;Guoren Wang
Affiliations:
Northeastern University, Shengyang, China;Northeastern University, Shengyang, China;Northeastern University, Shengyang, China;Northeastern University, Shengyang, China
Venue:
APWeb/WAIM'07 Proceedings of the joint 9th Asia-Pacific web and 8th international conference on web-age information management conference on Advances in data and web management
Year:
2007

Citing 8
Cited 1

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Analysis techniques for microarray time-series data

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Time Series Analysis of Microarray Data

BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data

TriGen: A genetic algorithm to mine triclusters in temporal gene expression data

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Previous work for finding patterns only focuses on grouping objects under the same subset of dimensions. Thus, an important bio-interesting pattern, i.e. time-shifting, will be ignored during the analysis of time series gene expression data. In this paper, we propose a new definition of coherent cluster for time series gene expression data called ts-cluster. The proposed model allows (1) the expression profiles of genes in a cluster to be coherent on different subsets of dimensions, i.e. these genes follow a certain time-shifting relationship, and (2) relative expression magnitude is taken into consideration instead of absolute one, which can tolerate the negative impact induced by "noise". This work is missed by previous research, which facilitates the study of regulatory relationships between genes. A novel algorithm is also presented and implemented to mine all the significant ts-clusters. Results experimented on both synthetic and real datasets show the ts-cluster algorithm is able to efficiently detect a significant amount of clusters missed by previous model, and these clusters are potentially of high biological significance.