C4.5: programs for machine learning
C4.5: programs for machine learning
Reduction Techniques for Instance-BasedLearning Algorithms
Machine Learning
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond
An introduction to variable and feature selection
The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification
The Journal of Machine Learning Research
A study of the behavior of several methods for balancing machine learning training data
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Effect of using varying negative examples in transcription factor binding site predictions
EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
A novel method for prediction of protein interaction sites based on integrated RBF neural networks
Computers in Biology and Medicine
Improving transcription factor binding site predictions by using randomised negative examples
IPCAT'12 Proceedings of the 9th international conference on Information Processing in Cells and Tissues
Proceedings of the 8th International Conference on Predictive Models in Software Engineering
DConfusion: a technique to allow cross study performance evaluation of fault prediction studies
Automated Software Engineering
Hi-index | 0.01 |
Currently the best algorithms for transcription factor binding site prediction within sequences of regulatory DNA are severely limited in accuracy. In this paper, we integrate 12 original binding site prediction algorithms, and use a 'window' of consecutive predictions in order to contextualise the neighbouring results. We combine either random selection or Tomek links under-sampling with SMOTE over-sampling techniques. In addition, we investigate the behaviour of four feature selection filtering methods: bi-normal separation, correlation coefficients, F-Score and a cross entropy based algorithm. Finally, we remove some of the final predicted binding sites on the basis of their biological plausibility. The results show that we can generate a new prediction that significantly improves on the performance of any one of the individual algorithms.