Discovering significant OPSM subspace clusters in massive gene expression data

Authors:
Byron J. Gao;Obi L. Griffith;Martin Ester;Steven J. M. Jones
Affiliations:
Simon Fraser University, Canada;British Columbia Cancer Agency, Canada;Simon Fraser University, Canada;British Columbia Cancer Agency, Canada
Venue:
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2006

Citing 7
Cited 8

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Mining Sequential Patterns: Generalizations and Performance Improvements

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Mining Sequential Patterns

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
OP-Cluster: Clustering by Tendency in High Dimensional Space

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

IEEE Transactions on Knowledge and Data Engineering

Evolutionary biclustering of gene expressions

Ubiquity
Subspace sums for extracting non-random data from massive noise

Knowledge and Information Systems
Discovering pattern-based subspace clusters by pattern tree

Knowledge-Based Systems
Data mining of vector–item patterns using neighborhood histograms

Knowledge and Information Systems
Efficiently mining local conserved clusters from gene expression data

Neurocomputing
Discovering significant relaxed order-preserving submatrices

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Noise-robust algorithm for identifying functionally associated biclusters from gene expression data

Information Sciences: an International Journal
Mining order-preserving submatrices from probabilistic matrices

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Order-preserving submatrixes (OPSMs) have been accepted as a biologically meaningful subspace cluster model, capturing the general tendency of gene expressions across a subset of conditions. In an OPSM, the expression levels of all genes induce the same linear ordering of the conditions. OPSM mining is reducible to a special case of the sequential pattern mining problem, in which a pattern and its supporting sequences uniquely specify an OPSM cluster. Those small twig clusters, specified by long patterns with naturally low support, incur explosive computational costs and would be completely pruned off by most existing methods for massive datasets containing thousands of conditions and hundreds of thousands of genes, which are common in today's gene expression analysis. However, it is in particular interest of biologists to reveal such small groups of genes that are tightly coregulated under many conditions, and some pathways or processes might require only two genes to act in concert. In this paper, we introduce the KiWi mining framework for massive datasets, that exploits two parameters k and w to provide a biased testing on a bounded number of candidates, substantially reducing the search space and problem scale, targeting on highly promising seeds that lead to significant clusters and twig clusters. Extensive biological and computational evaluations on real datasets demonstrate that KiWi can effectively mine biologically meaningful OPSM subspace clusters with good efficiency and scalability.