The nature of statistical learning theory
The nature of statistical learning theory
On combining classifiers using sum and product rules
Pattern Recognition Letters
Sum Versus Vote Fusion in Multiple Classifier Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining Multiple Classifiers in Probabilistic Neural Networks
MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Combining Pattern Classifiers: Methods and Algorithms
Combining Pattern Classifiers: Methods and Algorithms
Class imbalances versus small disjuncts
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Boosted Classification Trees and Class Probability/Quantile Estimation
The Journal of Machine Learning Research
Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
Exploratory undersampling for class-imbalance learning
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
On Position-Specific Scoring Matrix for Protein Function Prediction
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Bioinformatics
Bioinformatics
Hi-index | 0.01 |
Correctly localizing the protein-ATP binding residues is valuable for both basic experimental biology and drug discovery studies. Protein-ATP binding residues prediction is a typical imbalanced learning problem as the size of minority class (binding residues) is far less than that of majority class (non-binding residues) in the entire sequence. Directly applying the traditional machine learning approach for this task is not suitable as the learning results will be severely biased towards the majority class. To circumvent this problem, a modified AdaBoost ensemble scheme based on random under-sampling is developed. In addition, effectiveness of different features for protein-ATP binding residues prediction is systematically analyzed and a method for objectively reporting evaluation results under the imbalanced learning scenario is also discussed. Experimental results on three benchmark datasets show that the proposed method achieves higher prediction accuracy. The proposed method, called TargetATP, has been implemented with Java programming language and is distributed via Java Web Start technology. TargetATP and the datasets used are freely available at http://www.csbio.sjtu.edu.cn/bioinf/targetATP/ for academicuse.