Finding biclusters by random projections

Authors:
Stefano Lonardi;Wojciech Szpankowski;Qiaofeng Yang
Affiliations:
Department of Computer Science and Engineering, University of California, Riverside, CA, USA;Department of Computer Sciences, Purdue University, West Lafayette, IN, USA;Department of Computer Science and Engineering, University of California, Riverside, CA, USA
Venue:
Theoretical Computer Science
Year:
2006

Citing 19
Cited 4

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Approximating clique and biclique problems

Journal of Algorithms
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
On bipartite and multipartite clique problems

Journal of Algorithms
Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Computers and Intractability; A Guide to the Theory of NP-Completeness

Computers and Intractability; A Guide to the Theory of NP-Completeness
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
On the Complexity of the Generalized Block Distribution

IRREGULAR '96 Proceedings of the Third International Workshop on Parallel Algorithms for Irregularly Structured Problems
Enhanced Biclustering on Expression Data

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
A New Clustering Method for Microarray Data Analysis

CSB '02 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Sleeved coclustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering in Gene Expression Data by Tendency

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Biclustering Gene-Feature Matrices for Statistically Significant Dense Patterns

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference

A Robust Biclustering Method Based on Crossing Minimization in Bipartite Graphs

Graph Drawing
Computational systems biology

Algorithms and theory of computation handbook
Near optimal solutions for maximum quasi-bicliques

COCOON'10 Proceedings of the 16th annual international conference on Computing and combinatorics
Research of fast SOM clustering for text information

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	5.23

Visualization

Abstract

Given a matrix X composed of symbols, a bicluster is a submatrix of X obtained by removing some of the rows and some of the columns of X in such a way that each row of what is left reads the same string. In this paper, we are concerned with the problem of finding the bicluster with the largest area in a large matrix X. The problem is first proved to be NP-complete. We present a fast and efficient randomized algorithm that discovers the largest bicluster by random projections. A detailed probabilistic analysis of the algorithm and an asymptotic study of the statistical significance of the solutions are given. We report results of extensive simulations on synthetic data.