Identification of transcription factor binding sites with variable-order Bayesian networks

Authors:
I. Ben-Gal;A. Shani;A. Gohr;J. Grau;S. Arviv;A. Shmilovici;S. Posch;I. Grosse
Affiliations:
Department of Industrial Engineering, Tel-Aviv University Tel-Aviv, 69978, Israel;Department of Industrial Engineering, Tel-Aviv University Tel-Aviv, 69978, Israel;Institute of Plant Genetics and Crop Plant Research, (IPK) 06466 Gatersleben, Germany;Institute of Plant Genetics and Crop Plant Research, (IPK) 06466 Gatersleben, Germany;Department of Industrial Engineering, Tel-Aviv University Tel-Aviv, 69978, Israel;Department of Information Systems Engineering, Ben-Gurion University Beer-Sheva, 84105, Israel;Institute of Computer Science, University Halle 06099 Halle (Saale), Germany;Institute of Plant Genetics and Crop Plant Research, (IPK) 06466 Gatersleben, Germany
Venue:
Bioinformatics
Year:
2005

Citing 0
Cited 9

Fast and Adaptive Variable Order Markov Chain Construction

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Genetic algorithm and optimized weight matrix application for peroxisome proliferator response elements recognition: Prerequisites of accuracy growth for wide genome research

Intelligent Data Analysis - New Methods in Bioinformatics Presented at the Fifth International Conference on Bioinformatics of Genome Regulation and Structure
Measuring the Efficiency of the Intraday Forex Market with a Universal Data Compression Algorithm

Computational Economics
Bayesian unsupervised learning of DNA regulatory binding regions

Advances in Artificial Intelligence
Predicting protein secondary structure using a mixed-modal SVM method in a compound pyramid model

Knowledge-Based Systems
Predicting protein second structure using a novel hybrid method

Expert Systems with Applications: An International Journal
Computational molecular biology of genome expression and regulation

PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Improving protein secondary structure prediction using a multi-modal BP method

Computers in Biology and Medicine
Prioritizing Disease Genes and Understanding Disease Pathways

International Journal of Knowledge Discovery in Bioinformatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: We propose a new class of variable-order Bayesian network (VOBN) models for the identification of transcription factor binding sites (TFBSs). The proposed models generalize the widely used position weight matrix (PWM) models, Markov models and Bayesian network models. In contrast to these models, where for each position a fixed subset of the remaining positions is used to model dependencies, in VOBN models, these subsets may vary based on the specific nucleotides observed, which are called the context. This flexibility turns out to be of advantage for the classification and analysis of TFBSs, as statistical dependencies between nucleotides in different TFBS positions (not necessarily adjacent) may be taken into account efficiently---in a position-specific and context-specific manner. Results: We apply the VOBN model to a set of 238 experimentally verified sigma-70 binding sites in Escherichia coli. We find that the VOBN model can distinguish these 238 sites from a set of 472 intergenic 'non-promoter' sequences with a higher accuracy than fixed-order Markov models or Bayesian trees. We use a replicated stratified-holdout experiment having a fixed true-negative rate of 99.9%. We find that for a foreground inhomogeneous VOBN model of order 1 and a background homogeneous variable-order Markov (VOM) model of order 5, the obtained mean true-positive (TP) rate is 47.56%. In comparison, the best TP rate for the conventional models is 44.39%, obtained from a foreground PWM model and a background 2nd-order Markov model. As the standard deviation of the estimated TP rate is ∼0.01%, this improvement is highly significant. Availability: All datasets are available upon request from the authors. A web server for utilizing the VOBN and VOM models is available at http://www.eng.tau.ac.il/~bengal/ Contact: bengal@eng.tau.ac.il