Co-clustering documents and words using bipartite spectral graph partitioning
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Formal Concept Analysis: Mathematical Foundations
Formal Concept Analysis: Mathematical Foundations
Discovering local structure in gene expression data: the order-preserving submatrix problem
Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
d-Clusters: Capturing Subspace Correlation in a Large Data Set
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Information-theoretic co-clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast vertical mining using diffsets
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Biclustering of Expression Data Using Simulated Annealing
CBMS '05 Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems
An Efficient Constraint-Based Closed Set Mining Algorithm
ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
An effective algorithm for mining 3-clusters in vertically partitioned data
Proceedings of the 17th ACM conference on Information and knowledge management
Co-clustering: A Versatile Tool for Data Analysis in Biomedical Informatics
IEEE Transactions on Information Technology in Biomedicine
Hi-index | 0.00 |
High-throughput sequencing CHIP-Seq data exhibit binding events with possible binding locations and their strengths, followed by interpretation of the locations of peaks. Recent methods tend to summarize all CHIP-Seq peaks detected within a limited up and down region of each gene into one real-valued score in order to quantify the probability of regulation in a region. Applying subspace clustering techniques on these scores can help discover important knowledge such as the potential co-regulation or co-factor mechanisms. The ideal biclusters generated would contain subsets of genes and transcription factors TF such that the cell-values in biclusters are distributed around a mean value with very low variance. Such biclusters would indicate TF sets regulating gene sets with very similar probability values. However, most existing biclustering algorithms neither enforce low variance as the desired property of a bicluster, nor use variance as a guiding metric while searching for the desirable biclusters. In this paper we present an algorithm that searches a space of all overlapping biclusters organized in a lattice, and uses an upper bound on variance values of biclusters as the guiding metric. We show the algorithm to be an efficient and effective method for discovering the possibly overlapping biclusters under pre-defined variance bounds. We present in this paper our algorithm, its results with synthetic, CHIP-Seq and motif datasets, and compare them with the results obtained by other algorithms to demonstrate the power and effectiveness of our algorithm.