Mining low-variance biclusters to discover coregulation modules in sequencing datasets

Authors:
Zhen Hu;Raj Bhatnagar
Affiliations:
School of Computing Sciences and Informatics, University of Cincinnati, Cincinnati, OH, USA. E-mails: huze@mail.uc.edu, Raj.Bhatnagar@uc.edu;School of Computing Sciences and Informatics, University of Cincinnati, Cincinnati, OH, USA. E-mails: huze@mail.uc.edu, Raj.Bhatnagar@uc.edu
Venue:
Scientific Programming - Biological Knowledge Discovery and Data Mining
Year:
2012

Citing 13
Cited 0

Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Formal Concept Analysis: Mathematical Foundations

Formal Concept Analysis: Mathematical Foundations
Discovering local structure in gene expression data: the order-preserving submatrix problem

Proceedings of the sixth annual international conference on Computational biology
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fast vertical mining using diffsets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Biclustering of Expression Data Using Simulated Annealing

CBMS '05 Proceedings of the 18th IEEE Symposium on Computer-Based Medical Systems
Defining transcription modules using large-scale gene expression data

Bioinformatics
An Efficient Constraint-Based Closed Set Mining Algorithm

ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
An effective algorithm for mining 3-clusters in vertically partitioned data

Proceedings of the 17th ACM conference on Information and knowledge management
Co-clustering: A Versatile Tool for Data Analysis in Biomedical Informatics

IEEE Transactions on Information Technology in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

High-throughput sequencing CHIP-Seq data exhibit binding events with possible binding locations and their strengths, followed by interpretation of the locations of peaks. Recent methods tend to summarize all CHIP-Seq peaks detected within a limited up and down region of each gene into one real-valued score in order to quantify the probability of regulation in a region. Applying subspace clustering techniques on these scores can help discover important knowledge such as the potential co-regulation or co-factor mechanisms. The ideal biclusters generated would contain subsets of genes and transcription factors TF such that the cell-values in biclusters are distributed around a mean value with very low variance. Such biclusters would indicate TF sets regulating gene sets with very similar probability values. However, most existing biclustering algorithms neither enforce low variance as the desired property of a bicluster, nor use variance as a guiding metric while searching for the desirable biclusters. In this paper we present an algorithm that searches a space of all overlapping biclusters organized in a lattice, and uses an upper bound on variance values of biclusters as the guiding metric. We show the algorithm to be an efficient and effective method for discovering the possibly overlapping biclusters under pre-defined variance bounds. We present in this paper our algorithm, its results with synthetic, CHIP-Seq and motif datasets, and compare them with the results obtained by other algorithms to demonstrate the power and effectiveness of our algorithm.