Using Real-Valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

Authors:
Mark Robinson;Offer Sharabi;Yi Sun;Rod Adams;Rene Boekhorst;Alistair G. Rust;Neil Davey
Affiliations:
University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, Great Britain;University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, Great Britain;University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, Great Britain;University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, Great Britain;University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, Great Britain;University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, Great Britain;University of Hertfordshire, College Lane, Hatfield, Hertfordshire AL10 9AB, Great Britain
Venue:
ICANNGA '07 Proceedings of the 8th international conference on Adaptive and Natural Computing Algorithms, Part I
Year:
2007

Citing 4
Cited 0

The relationship between recall and precision

Journal of the American Society for Information Science
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.01

Visualization

Abstract

Currently the best algorithms for transcription factor binding site predictions are severely limited in accuracy. However, a non-linear combination of these algorithms could improve the quality of predictions. A support-vector machine was applied to combine the predictions of 12 key real valued algorithms. The data was divided into a training set and a test set, of which two were constructed: filtered and unfiltered. In addition, a different "window" of consecutive results was used in the input vector in order to contextualize the neighbouring results. Finally, classification results were improved with the aid of under and over sampling techniques. Our major finding is that we can reduce the False-Positive rate significantly. We also found that the bigger the window, the higher the F-score, but the more likely it is to make a false positive prediction, with the best trade-off being a window size of about 7.