Using pre & post-processing methods to improve binding site predictions

Authors:
Yi Sun;Cristina González Castellano;Mark Robinson;Rod Adams;Alistair G. Rust;Neil Davey
Affiliations:
Science and Technology Research Institute, University of Hertfordshire, College Lane, Hatfield, Hertfordshire, AL10 9AB, UK;IgnosEstudiode IngenieríaS.L., Calle San Juan, 10 La Laguna, Santa Cruz de Tenerife, Canary Islands, C.P. 38203, Spain;Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA;Science and Technology Research Institute, University of Hertfordshire, College Lane, Hatfield, Hertfordshire, AL10 9AB, UK;Institute for Systems Biology, 1441 North 34th Street, Seattle, WA 98103, USA;Science and Technology Research Institute, University of Hertfordshire, College Lane, Hatfield, Hertfordshire, AL10 9AB, UK
Venue:
Pattern Recognition
Year:
2009

Citing 9
Cited 5

C4.5: programs for machine learning

C4.5: programs for machine learning
Reduction Techniques for Instance-BasedLearning Algorithms

Machine Learning
A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
An introduction to variable and feature selection

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
A study of the behavior of several methods for balancing machine learning training data

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

Effect of using varying negative examples in transcription factor binding site predictions

EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
A novel method for prediction of protein interaction sites based on integrated RBF neural networks

Computers in Biology and Medicine
Improving transcription factor binding site predictions by using randomised negative examples

IPCAT'12 Proceedings of the 9th international conference on Information Processing in Cells and Tissues
Comparing the performance of fault prediction models which report multiple performance measures: recomputing the confusion matrix

Proceedings of the 8th International Conference on Predictive Models in Software Engineering
DConfusion: a technique to allow cross study performance evaluation of fault prediction studies

Automated Software Engineering

Quantified Score

Hi-index	0.01

Visualization

Abstract

Currently the best algorithms for transcription factor binding site prediction within sequences of regulatory DNA are severely limited in accuracy. In this paper, we integrate 12 original binding site prediction algorithms, and use a 'window' of consecutive predictions in order to contextualise the neighbouring results. We combine either random selection or Tomek links under-sampling with SMOTE over-sampling techniques. In addition, we investigate the behaviour of four feature selection filtering methods: bi-normal separation, correlation coefficients, F-Score and a cross entropy based algorithm. Finally, we remove some of the final predicted binding sites on the basis of their biological plausibility. The results show that we can generate a new prediction that significantly improves on the performance of any one of the individual algorithms.