Clustering by pattern similarity in large data sets
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Compressing Bitmap Indexes for Faster Search Operations
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
OP-Cluster: Clustering by Tendency in High Dimensional Space
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A Time Series Analysis of Microarray Data
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
A Fast Algorithm for Subspace Clustering by Pattern Similarity
SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Biclustering in Gene Expression Data by Tendency
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Cluster Analysis for Gene Expression Data: A Survey
IEEE Transactions on Knowledge and Data Engineering
A Time-Series Biclustering Algorithm for Revealing Co-Regulated Genes
ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
TRICLUSTER: an effective algorithm for mining coherent clusters in 3D microarray data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Clustering short time series gene expression data
Bioinformatics
On the performance of bitmap indices for high cardinality attributes
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Hi-index | 0.00 |
As explored by biologists, there is a real and emerging need to identify co-regulated gene clusters, which include both positive and negative regulated gene clusters. However, the existing pattern-based and tendency-based clustering approaches are only designed for finding positive regulated gene clusters. In this paper, a new subspace clustering model called g-Cluster is proposed for gene expression data. The proposed model has the following advantages: 1) find both positive and negative co-regulated genes in a shot, 2) get away from the restriction of magnitude transformation relationship among co-regulated genes, and 3) guarantee quality of clusters and significance of regulations using a novel similarity measurement gCode and a user-specified regulation threshold δ, respectively. No previous work measures up to the task which has been set. Moreover, MDL technique is introduced to avoid insignificant g-Clusters generated. A tree structure, namely GS-tree, is also designed, and two algorithms combined with efficient pruning and optimization strategies to identify all qualified g-Clusters. Extensive experiments are conducted on real and synthetic datasets. The experimental results show that 1) the algorithm is able to find an amount of co-regulated gene clusters missed by previous models, which are potentially of high biological significance, and 2) the algorithms are effective and efficient, and outperform the existing approaches.