Effect of using varying negative examples in transcription factor binding site predictions

Authors:
Faisal Rezwan;Yi Sun;Neil Davey;Rod Adams;Alistair G. Rust;Mark Robinson
Affiliations:
School of Computer Science, University of Hertfordshire, College Lane, Hatfield, Hertfordshire, UK;School of Computer Science, University of Hertfordshire, College Lane, Hatfield, Hertfordshire, UK;School of Computer Science, University of Hertfordshire, College Lane, Hatfield, Hertfordshire, UK;School of Computer Science, University of Hertfordshire, College Lane, Hatfield, Hertfordshire, UK;Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK;Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing MI
Venue:
EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Year:
2011

Citing 6
Cited 1

ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation

Bioinformatics
2008 Special Issue: Combining experts in order to identify binding sites in yeast and mouse genomic data

Neural Networks
Prediction of Binding Sites in the Mouse Genome Using Support Vector Machines

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
Using pre & post-processing methods to improve binding site predictions

Pattern Recognition
Integrating genomic binding site predictions using real-valued meta classifiers

Neural Computing and Applications
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

Improving transcription factor binding site predictions by using randomised negative examples

IPCAT'12 Proceedings of the 9th international conference on Information Processing in Cells and Tissues

Quantified Score

Hi-index	0.00

Visualization

Abstract

Identifying transcription factor binding sites computationally is a hard problem as it produces many false predictions. Combining the predictions from existing predictors can improve the overall predictions by using classification methods like Support Vector Machines (SVM). But conventional negative examples (that is, example of nonbinding sites) in this type of problem are highly unreliable. In this study, we have used different types of negative examples. One class of the negative examples has been taken from far away from the promoter regions, where the occurrence of binding sites is very low, and another one has been produced by randomization. Thus we observed the effect of using different negative examples in predicting transcription factor binding sites in mouse. We have also devised a novel cross-validation technique for this type of biological problem.