Biclustering Gene-Feature Matrices for Statistically Significant Dense Patterns

Authors:
Mehmet Koyuturk;Wojciech Szpankowski;Ananth Grama
Affiliations:
Purdue University;Purdue University;Purdue University
Venue:
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Year:
2004

Citing 4
Cited 6

Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery

Finding biclusters by random projections

Theoretical Computer Science
Identification of Regulatory Modules in Time Series Gene Expression Data Using a Linear Time Biclustering Algorithm

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Significance and recovery of block structures in binary matrices with noise

COLT'06 Proceedings of the 19th annual conference on Learning Theory
A linear time biclustering algorithm for time series gene expression data

WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
Subnetwork state functions define dysregulated subnetworks in cancer

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
A new measure for gene expression biclustering based on non-parametric correlation

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biclustering is an important problem that arises in diverse applications, including analysis of gene expression and drug interaction data. The problem can be formalized in various ways through different interpretation of data and associated optimization functions. We focus on the problem of finding unusually dense patterns in binary (0-1) matrices. This formulation is appropriate for analyzing experimental datasets that come from not only binary quantization of gene expression data, but also more comprehensive datasets such as gene-feature matrices that include functions of coded proteins and motifs in the coding sequence. We formalize the notion of an "unusually" dense submatrix to evaluate the interestingness of a pattern in terms of statistical significance based on the assumption of a uniform memoryless source. We then simplify it to assess statistical significance of discovered patterns. Using statistical significance as an objective function, we formulate the problem as one of finding significant dense submatrices of a large sparse matrix. Adopting a simple iterative heuristic along with randomized initialization techniques, we derive fast algorithms for discovering binary biclusters. We conduct experiments on a binary gene-feature matrix and a quantized breast tumor gene expression matrix. Our experimental results show that the proposed method quickly discovers all interesting patterns in these datasets.