Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Analysis techniques for microarray time-series data
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Clustering by pattern similarity in large data sets
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Time Series Analysis of Microarray Data
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Hi-index | 0.00 |
Previous work for finding patterns only focuses on grouping objects under the same subset of dimensions. Thus, an important bio-interesting pattern, i.e. time-shifting, will be ignored during the analysis of time series gene expression data. In this paper, we propose a new definition of coherent cluster for time series gene expression data called ts-cluster. The proposed model allows (1) the expression profiles of genes in a cluster to be coherent on different subsets of dimensions, i.e. these genes follow a certain time-shifting relationship, and (2) relative expression magnitude is taken into consideration instead of absolute one, which can tolerate the negative impact induced by "noise". This work is missed by previous research, which facilitates the study of regulatory relationships between genes. A novel algorithm is also presented and implemented to mine all the significant ts-clusters. Results experimented on both synthetic and real datasets show the ts-cluster algorithm is able to efficiently detect a significant amount of clusters missed by previous model, and these clusters are potentially of high biological significance.