Defining transcription modules using large-scale gene expression data

Authors:
Jan Ihmels;Sven Bergmann;Naama Barkai
Affiliations:
Department of Molecular Genetics;Department of Molecular Genetics;Department of Molecular Genetics
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 30

Gene expression modeling through positive boolean functions

International Journal of Approximate Reasoning
Finding Additive Biclusters with Random Background

CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Evaluating switching neural networks through artificial and real gene expression data

Artificial Intelligence in Medicine
An association analysis approach to biclustering

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
A probabilistic relaxation labeling framework for reducing the noise effect in geometric biclustering of gene expression data

Pattern Recognition
Linear Coherent Bi-cluster Discovery via Line Detection and Sample Majority Voting

COCOA '09 Proceedings of the 3rd International Conference on Combinatorial Optimization and Applications
Bi-clustering of Gene Expression Data Using Conditional Entropy

PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
BiHEA: A Hybrid Evolutionary Approach for Microarray Biclustering

BSB '09 Proceedings of the 4th Brazilian Symposium on Bioinformatics: Advances in Bioinformatics and Computational Biology
An automatic gene ontology software tool for bicluster and cluster comparisons

CIBCB'09 Proceedings of the 6th Annual IEEE conference on Computational Intelligence in Bioinformatics and Computational Biology
Binary matrix factorization for analyzing gene expression data

Data Mining and Knowledge Discovery
Finding multiple coherent biclusters in microarray data using variable string length multiobjective genetic algorithm

IEEE Transactions on Information Technology in Biomedicine - Special section on body sensor networks
Biclustering of microarray data based on singular value decomposition

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
Nearest-biclusters collaborative filtering with constant values

WebKDD'06 Proceedings of the 8th Knowledge discovery on the web international conference on Advances in web mining and web usage analysis
Differential biclustering for gene expression analysis

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
Experimental comparison of biclustering algorithms for PPI networks

Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
A novel approach for biclustering gene expression data using modular singular value decomposition

CIBB'09 Proceedings of the 6th international conference on Computational intelligence methods for bioinformatics and biostatistics
Independent component analysis: Mining microarray data for fundamental human gene expression modules

Journal of Biomedical Informatics
Linear coherent bi-cluster discovery via beam detection and sample set clustering

COCOA'10 Proceedings of the 4th international conference on Combinatorial optimization and applications - Volume Part I
Algorithm for low-variance biclusters to identify coregulation modules in sequencing datasets

Proceedings of the Tenth International Workshop on Data Mining in Bioinformatics
Parallelized Evolutionary Learning for Detection of Biclusters in Gene Expression Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Biological specifications for a synthetic gene expression data generation model

WILF'05 Proceedings of the 6th international conference on Fuzzy Logic and Applications
The impact of feature representation to the biclustering of symptoms-herbs in TCM

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Biclustering-driven ensemble of Bayesian belief network classifiers for underdetermined problems

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Hypergraph based geometric biclustering algorithm

Pattern Recognition Letters
Techniques of biclustering in gene expression analysis

ITIB'12 Proceedings of the Third international conference on Information Technologies in Biomedicine
BiMine+: An efficient algorithm for discovering relevant biclusters of DNA microarray data

Knowledge-Based Systems
A new clustering approach for learning transcriptional modules

International Journal of Data Mining and Bioinformatics
Sparse learning based linear coherent bi-clustering

WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
Mining low-variance biclusters to discover coregulation modules in sequencing datasets

Scientific Programming - Biological Knowledge Discovery and Data Mining

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Large-scale gene expression data comprising a variety of cellular conditions hold the promise of a global view on the transcription program. While conventional clustering algorithms have been successfully applied to smaller datasets, the utility of many algorithms for the analysis of large-scale data is limited by their inability to capture combinatorial and condition-specific co-regulation. In addition, there is an increasing need to integrate the rapidly accumulating body of other high-throughput biological data with the expression analysis. In a previous work, we introduced the signature algorithm, which overcomes the problems of conventional clustering and allows for intuitive integration of additional biological data. However, this approach is constrained by the comprehensiveness of relevant external data and its lacking ability to capture hierarchical modularity. Methods: We present a novel method for the analysis of large-scale expression data, which assigns genes into context-dependent and potentially overlapping regulatory units. We introduce the notion of a transcription module as a self-consistent regulatory unit consisting of a set of co-regulated genes as well as the experimental conditions that induce their co-regulation. Self-consistency is defined by a rigorous mathematical criterion. We propose an efficient algorithm to identify such modules, which is based on the iterative application of the signature algorithm. A threshold parameter that determines the resolution of the modular decomposition is introduced. Results: The method is applied systematically to over 1000 expression profiles of the yeast Saccharomyces cerevisiae, and the results are presented using two complementary visualization schemes we developed. The average biological coherence, as measured by the conservation of putative cis-regulatory motifs between four related yeast species, is higher for transcription modules than for clusters identified by other methods applied to the same dataset. Our method is related to singular value decomposition (SVD) and to the pairwise average linkage clustering algorithm. It extends SVD by filtering out noise in the expression data and offering variable resolution to reveal hierarchical organization. It furthermore has the advantage over both methods of capturing overlapping modules in the presence of combinatorial regulation. Supplementary information: http://www.weizmann.ac.il/~barkai/modules