Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Finding biclusters by random projections
Theoretical Computer Science
Crossing minimization in weighted bipartite graphs
WEA'07 Proceedings of the 6th international conference on Experimental algorithms
Hi-index | 0.00 |
Clustering refers to the process of organizing a set of input vectors into clusters based on similarity defined according to some preset distance measure. In many cases it is more desirable to simultaneously cluster the dimensions as well as the vectors themselves. This special instance of clustering, referred to as biclustering , was introduced by Hartigan [3]. It has many applications in areas including data mining, pattern recognition, and computational biology. Considerable attention has been devoted to it from the gene expression data analysis; see [5] for a nice survey. Input is represented in a data matrix, where the rows and columns of the matrix correspond to genes and conditions respectively. Each entry in the matrix reflects the expression level of a gene under a certain condition. From a graph-teoretical perspective the data matrix can be viewed as a weighted bipartite graph, where the vertex set of one partition is the set of genes and the vertex set of the other partition is the set of conditions. An existing weighted edge incident on a gene-condition pair reflects the expression level of the gene under that specific experimental condition. The biclustering problem may then be described in terms of the various versions of the biclique extraction problem in bipartite graphs. Many interesting versions that directly apply to the biclustering problem are NP-hard [4]. Various graph-theoretical approaches employing heuristics have been suggested [1,4,6,7].