Finding Additive Biclusters with Random Background

Authors:
Jing Xiao;Lusheng Wang;Xiaowen Liu;Tao Jiang
Affiliations:
Department of Computer Science and Technology, Tsinghua University,;Department of Computer Science, City University of Hong Kong, Hong Kong;Department of Computer Science, University of Western Ontario, London, Canada N6A 5B7;Department of Computer Science and Engineering, University of California, Riverside
Venue:
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Year:
2008

Citing 14
Cited 0

Expected complexity of graph partitioning problems

Discrete Applied Mathematics - Special issue: Combinatorial Optimization 1992 (CO92)
Randomized algorithms

Randomized algorithms
Finding a large hidden clique in a random graph

proceedings of the eighth international conference on Random structures and algorithms
Finding and certifying a large hidden clique in a semirandom graph

Random Structures & Algorithms
On the closest string and substring problems

Journal of the ACM (JACM)
Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The maximum edge biclique problem is NP-complete

Discrete Applied Mathematics
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Defining transcription modules using large-scale gene expression data

Bioinformatics
BicAT: a biclustering analysis toolbox

Bioinformatics
A systematic comparison and evaluation of biclustering methods for gene expression data

Bioinformatics
Computing the maximum similarity bi-clusters of gene expression data

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The biclustering problem has been extensively studied in many areas including e-commerce, data mining, machine learning, pattern recognition, statistics, and more recently in computational biology. Given an n×mmatrix A(n茂戮驴 m), the main goal of biclustering is to identify a subset of rows (called objects) and a subset of columns (called properties) such that some objective function that specifies the quality of the found bicluster (formed by the subsets of rows and of columns of A) is optimized. The problem has been proved or conjectured to be NP-hard under various mathematical models. In this paper, we study a probabilistic model of the implanted additive bicluster problem, where each element in the n×mbackground matrix is a random number from [0, L茂戮驴 1], and a k×kimplanted additive bicluster is obtained from an error-free additive bicluster by randomly changing each element to a number in [0, L茂戮驴 1] with probability 茂戮驴. We propose an O(n2m) time voting algorithm to solve the problem. We show that for any constant 茂戮驴such that $(1-\delta)(1-\theta)^2 -\frac 1 L 0$, when $k \ge \max \left\{\frac 8 \alpha \sqrt{n\log n},~ \frac {8 \log n} c + \log (2L)\right\}$, where cis a constant number, the voting algorithm can correctly find the implanted bicluster with probability at least $1 - \frac{9}{n^{2}}$. We also implement our algorithm as a software tool for finding novel biclusters in microarray gene expression data, called VOTE. The implementation incorporates several nontrivial ideas for estimating the size of an implanted bicluster, adjusting the threshold in voting, dealing with small biclusters, and dealing with multiple (and overlapping) implanted biclusters. Our experimental results on both simulated and real datasets show that VOTE can find biclusters with a high accuracy and speed.