Efficiently mining local conserved clusters from gene expression data

Authors:
Guoren Wang;Yuhai Zhao;Xiangguo Zhao;Botao Wang;Baiyou Qiao
Affiliations:
College of Computer Science and Engineering, Northeastern University, Shenyang 110004, China;College of Computer Science and Engineering, Northeastern University, Shenyang 110004, China;College of Computer Science and Engineering, Northeastern University, Shenyang 110004, China;College of Computer Science and Engineering, Northeastern University, Shenyang 110004, China;College of Computer Science and Engineering, Northeastern University, Shenyang 110004, China
Venue:
Neurocomputing
Year:
2010

Citing 23
Cited 1

Efficient mining of emerging patterns: discovering trends and differences

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Interrelated Two-way Clustering: An Unsupervised Approach for Gene Expression Data Analysis

BIBE '01 Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering
Enhanced Biclustering on Expression Data

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
OP-Cluster: Clustering by Tendency in High Dimensional Space

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining phenotypes and informative genes from gene expression data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
A Fast Algorithm for Subspace Clustering by Pattern Similarity

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Mining coherent gene clusters from gene-sample-time microarray data

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering in Gene Expression Data by Tendency

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
ESPD: a pattern detection model underlying gene expression profiles

Bioinformatics
Biclustering Models for Structured Microarray Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Mining Shifting-and-Scaling Co-Regulation Patterns on Gene Expression Profiles

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Discovering significant OPSM subspace clusters in massive gene expression data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Shifting and scaling patterns from gene expression data

Bioinformatics
Techniques for clustering gene expression data

Computers in Biology and Medicine
An efficient gene selection algorithm based on mutual information

Neurocomputing

GPU-based biclustering for microarray data analysis in neurocomputing

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Extensive studies have shown that mining gene expression data is important for both bioinformatics research and biomedical applications. However, most existing studies focus only on either co-regulated gene clusters or emerging patterns. Factually, another analysis scheme, i.e. simultaneously mining phenotypes and diagnostic genes, is also biologically significant, which has received relative little attention so far. In this paper, we explore a novel concept of local conserved gene cluster (LC-Cluster) to address this problem. Specifically, an LC-Cluster contains a subset of genes and a subset of conditions such that the genes show steady expression values (instead of the coherent pattern rising and falling synchronously defined by some previous work) only on the subset of conditions but not along all given conditions. To avoid the exponential growth in subspace search, we further present two efficient algorithms, namely FALCONER and E-FALCONER, to mine the complete set of maximal LC-Clusters from gene expression data sets based on enumeration tree. Extensive experiments conducted on both real gene expression data sets and synthetic data sets show: (1) our approaches are efficient and effective, (2) our approaches outperform the existing enumeration tree based algorithms, and (3) our approaches can discover an amount of LC-Clusters, which are potentially of high biological significance.