Expected complexity of graph partitioning problems
Discrete Applied Mathematics - Special issue: Combinatorial Optimization 1992 (CO92)
Randomized algorithms
Finding a large hidden clique in a random graph
proceedings of the eighth international conference on Random structures and algorithms
Finding and certifying a large hidden clique in a semirandom graph
Random Structures & Algorithms
On the closest string and substring problems
Journal of the ACM (JACM)
Discovering local structure in gene expression data: the order-preserving submatrix problem
Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
d-Clusters: Capturing Subspace Correlation in a Large Data Set
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The maximum edge biclique problem is NP-complete
Discrete Applied Mathematics
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
BicAT: a biclustering analysis toolbox
Bioinformatics
Hi-index | 0.00 |
The biclustering problem has been extensively studied in many areas including e-commerce, data mining, machine learning, pattern recognition, statistics, and more recently in computational biology. Given an n×mmatrix A(n茂戮驴 m), the main goal of biclustering is to identify a subset of rows (called objects) and a subset of columns (called properties) such that some objective function that specifies the quality of the found bicluster (formed by the subsets of rows and of columns of A) is optimized. The problem has been proved or conjectured to be NP-hard under various mathematical models. In this paper, we study a probabilistic model of the implanted additive bicluster problem, where each element in the n×mbackground matrix is a random number from [0, L茂戮驴 1], and a k×kimplanted additive bicluster is obtained from an error-free additive bicluster by randomly changing each element to a number in [0, L茂戮驴 1] with probability 茂戮驴. We propose an O(n2m) time voting algorithm to solve the problem. We show that for any constant 茂戮驴such that $(1-\delta)(1-\theta)^2 -\frac 1 L 0$, when $k \ge \max \left\{\frac 8 \alpha \sqrt{n\log n},~ \frac {8 \log n} c + \log (2L)\right\}$, where cis a constant number, the voting algorithm can correctly find the implanted bicluster with probability at least $1 - \frac{9}{n^{2}}$. We also implement our algorithm as a software tool for finding novel biclusters in microarray gene expression data, called VOTE. The implementation incorporates several nontrivial ideas for estimating the size of an implanted bicluster, adjusting the threshold in voting, dealing with small biclusters, and dealing with multiple (and overlapping) implanted biclusters. Our experimental results on both simulated and real datasets show that VOTE can find biclusters with a high accuracy and speed.