Optimal combinations of pattern classifiers
Pattern Recognition Letters
Machine Learning
Combining labeled and unlabeled data with co-training
COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Bioinformatics
A brief introduction to boosting
IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Hi-index | 0.00 |
Understanding transcriptional regulation requires a reliable identification of the DNA binding sites that are recognized by each transcription factor (TF). Building an accurate bioinformatic model of TF-DNA binding is an essential step to differentiate true binding targets from spurious ones. Conventional approches of binding site prediction are based on the notion of consensus sequences. They are formalized by the so-called position-specific weight matrices (PWM) and rely on the statistical analysis of DNA sequence of known binding sites. To improve these techniques, we propose to use genome organization knowledge about the optimal positioning of co-regulated genes along the whole chromosome. For this purpose, we use learning machine approaches to optimally combine sequence information with positioning information. We present a new learning algorithm called PreCisIon, which relies on a TF binding classifier that optimally combines a set of PWMs and chrommosal position based classifiers. This non-linear binding decision rule drastically reduces the rate of false positives so that PRECISION consistently outperforms sequence-based methods. This is shown by implementing a cross-validation analysis in two model organisms: Escherichia coli and Bacillus Subtilis. The analysis is based on the identification of binding sites for 24 TFs; PRECISION achieved on average an AUC (aera under the curve) of 70% and 60%, a sensitivity of 80% and 70%, and a specificity of 60% and 56% for B. subtilis and E. coli, respectively.