Subspace clustering of microarray data based on domain transformation

Authors:
Jongeun Jun;Seokkyung Chung;Dennis McLeod
Affiliations:
Department of Computer Science, University of Southern California, Los Angeles, CA;Yahoo! Inc., Santa Clara, CA;Department of Computer Science, University of Southern California, Los Angeles, CA
Venue:
VDMB'06 Proceedings of the First international conference on Data Mining and Bioinformatics
Year:
2006

Citing 9
Cited 2

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Mining phenotypes and informative genes from gene expression data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Cluster Analysis for Gene Expression Data: A Survey

IEEE Transactions on Knowledge and Data Engineering
Mining gene expression datasets using density-based clustering

Proceedings of the thirteenth ACM international conference on Information and knowledge management
CLICKS: Mining Subspace Clusters in Categorical Data via K-Partite Maximal Cliques

ICDE '05 Proceedings of the 21st International Conference on Data Engineering

Focused local cluster formation for multidimensional microarray data

AEE'08 Proceedings of the 7th WSEAS International Conference on Application of Electrical Engineering
A web-based novel term similarity framework for ontology learning

ODBASE'06/OTM'06 Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a mining framework that supports the identification of useful knowledge based on data clustering. With the recent advancement of microarray technologies, we focus our attention on gene expression datasets mining. In particular, given that genes are often co-expressed under subsets of experimental conditions, we present a novel subspace clustering algorithm. In contrast to previous approaches, our method is based on the observation that the number of subspace clusters is related with the number of maximal subspace clusters to which any gene pair can belong. By performing discretization to gene expression profiles, the similarity between two genes is transformed as a sequence of symbols that represents the maximal subspace cluster for the gene pair. This domain transformation (from genes into gene-gene relations) allows us to make the number of possible subspace clusters dependent on the number of genes. Based on the symbolic representations of genes, we present an efficient subspace clustering algorithm that is scalable to the number of dimensions. In addition, the running time can be drastically reduced by utilizing inverted index and pruning non-interesting subspaces. Experimental results indicate that the proposed method efficiently identifies co-expressed gene subspace clusters for a yeast cell cycle dataset.