Algorithm for low-variance biclusters to identify coregulation modules in sequencing datasets

Authors:
Zhen Hu;Raj Bhatnagar
Affiliations:
University of Cincinnati;University of Cincinnati
Venue:
Proceedings of the Tenth International Workshop on Data Mining in Bioinformatics
Year:
2011

Citing 13
Cited 0

Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Formal Concept Analysis: Mathematical Foundations

Formal Concept Analysis: Mathematical Foundations
Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Biclustering of Expression Data Using Simulated Annealing

CBMS '05 Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems
Defining transcription modules using large-scale gene expression data

Bioinformatics
An Efficient Constraint-Based Closed Set Mining Algorithm

ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
An effective algorithm for mining 3-clusters in vertically partitioned data

Proceedings of the 17th ACM conference on Information and knowledge management
Co-clustering: A Versatile Tool for Data Analysis in Biomedical Informatics

IEEE Transactions on Information Technology in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-throughput sequencing (CHIP-Seq) data exhibit binding events with possible binding locations and their strengths, followed by interpretations of the locations of peaks. Recent methods tend to summarize all CHIP-Seq peaks detected within a limited up and down region of each gene into one real-valued score in order to quantify the probability of regulation in a region. Applying subspace clustering (or biclustering) techniques on these scores would discover important knowledge such as the potential co-regulation or co-factors mechanisms. The ideal biclusters generated should contain subsets of genes, and transcription factors (TF) such that the cell-values in biclusters are distributed around a mean value with low variance. Such biclusters would indicate TF sets regulating gene sets with the same probability values. However, most existing biclustering algorithms are neither able to enforce variance as a strict limitation on the values contained in a bicluster, nor use variance as the guiding metric while searching for the desirable biclusters. An algorithm that uses search spaces defined by lattices containing all overlapping biclusters and a bound on variance values as the guiding metric is presented in this paper. The algorithm is shown to be an efficient and effective method for discovering the possibly overlapping biclusters under pre-defined variance bounds. We present in this paper our algorithm, its results with synthetic and CHIP-Seq and motif datasets, and compare them with the results obtained by other algorithms to demonstrate the power and effectiveness of our algorithm.