Finding biclusters by random projections

  • Authors:
  • Stefano Lonardi;Wojciech Szpankowski;Qiaofeng Yang

  • Affiliations:
  • Department of Computer Science and Engineering, University of California, Riverside, CA, USA;Department of Computer Sciences, Purdue University, West Lafayette, IN, USA;Department of Computer Science and Engineering, University of California, Riverside, CA, USA

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2006

Quantified Score

Hi-index 5.23

Visualization

Abstract

Given a matrix X composed of symbols, a bicluster is a submatrix of X obtained by removing some of the rows and some of the columns of X in such a way that each row of what is left reads the same string. In this paper, we are concerned with the problem of finding the bicluster with the largest area in a large matrix X. The problem is first proved to be NP-complete. We present a fast and efficient randomized algorithm that discovers the largest bicluster by random projections. A detailed probabilistic analysis of the algorithm and an asymptotic study of the statistical significance of the solutions are given. We report results of extensive simulations on synthetic data.