Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
d-Clusters: Capturing Subspace Correlation in a Large Data Set
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Mining coherent gene clusters from gene-sample-time microarray data
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Redundancy based feature selection for microarray data
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Cluster Analysis for Gene Expression Data: A Survey
IEEE Transactions on Knowledge and Data Engineering
Computational aspects of mining maximal frequent patterns
Theoretical Computer Science
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Direct integration of microarrays for selecting informative genes and phenotype classification
Information Sciences: an International Journal
Subspace clustering of microarray data based on domain transformation
VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
Gene Co-Adaboost: a semi-supervised approach for classifying gene expression data
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Hi-index | 0.00 |
Mining microarray gene expression data is an important research topic in bioinformatics with broad applications. While most of the previous studies focus on clustering either genes or samples, it is interesting to ask whether we can partition the complete set of samples into exclusive groups (called phenotypes) and find a set of informative genes that can manifest the phenotype structure. In this paper, we propose a new problem of simultaneously mining phenotypes and informative genes from gene expression data. Some statistics-based metrics are proposed to measure the quality of the mining results. Two interesting algorithms are developed: the heuristic search and the mutual reinforcing adjustment method. We present an extensive performance study on both real-world data sets and synthetic data sets. The mining results from the two proposed methods are clearly better than those from the previous methods. They are ready for the real-world applications. Between the two methods, the mutual reinforcing adjustment method is in general more scalable, more effective and with better quality of the mining results.