Improving transcription factor binding site predictions by using randomised negative examples

Authors:
Faisal Rezwan;Yi Sun;Neil Davey;Rod Adams;Alistair G. Rust;Mark Robinson
Affiliations:
School of Computer Science, University of Hertfordshire, Hatfield, Hertfordshire, UK;School of Computer Science, University of Hertfordshire, Hatfield, Hertfordshire, UK;School of Computer Science, University of Hertfordshire, Hatfield, Hertfordshire, UK;School of Computer Science, University of Hertfordshire, Hatfield, Hertfordshire, UK;Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK;Benaroya Research Institute at Virginia Mason, WA
Venue:
IPCAT'12 Proceedings of the 9th international conference on Information Processing in Cells and Tissues
Year:
2012

Citing 11
Cited 0

Support-Vector Networks

Machine Learning
Classification and knowledge discovery in protein databases

Journal of Biomedical Informatics - Special issue: Biomedical machine learning
ORegAnno: an open access database and curation system for literature-derived promoters, transcription factor binding sites and regulatory variation

Bioinformatics
Predicting Binding Sites in the Mouse Genome

ICMLA '07 Proceedings of the Sixth International Conference on Machine Learning and Applications
2008 Special Issue: Combining experts in order to identify binding sites in yeast and mouse genomic data

Neural Networks
Prediction of Binding Sites in the Mouse Genome Using Support Vector Machines

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part II
Using pre & post-processing methods to improve binding site predictions

Pattern Recognition
Integrating genomic binding site predictions using real-valued meta classifiers

Neural Computing and Applications
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Identifying binding sites in sequential genomic data

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Effect of using varying negative examples in transcription factor binding site predictions

EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is known that much of the genetic change underlying morphological evolution takes place in cis-regulatory regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental methods for finding binding sites exist with some limitations regarding their applicability, accuracy, availability or cost. On the other hand predicting algorithms perform rather poorly. The aim of this research is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence, with particular emphasis on the use of the Support Vector Machine (SVM). Data from two organisms, yeast and mouse, were used in this study. The initial results were not particularly encouraging, as still giving predictions of low quality. However, when the vectors labelled as non-binding sites in the training set were replaced by randomised training vectors, a significant improvement in performance was observed. This gave substantial improvement over the yeast genome and even greater improvement for the mouse data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct.