Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling

Authors:
Dong-Jun Yu;Jun Hu;Zhen-Min Tang;Hong-Bin Shen;Jian Yang;Jing-Yu Yang
Affiliations:
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China and Changshu Institute, Nanjing University of Science and Technology, Changshu 21 ...;School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China;School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China;Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, PR China;School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China;School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
Venue:
Neurocomputing
Year:
2013

Citing 18
Cited 1

The nature of statistical learning theory

The nature of statistical learning theory
On combining classifiers using sum and product rules

Pattern Recognition Letters
Sum Versus Vote Fusion in Multiple Classifier Systems

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining Multiple Classifiers in Probabilistic Neural Networks

MCS '00 Proceedings of the First International Workshop on Multiple Classifier Systems
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Class imbalances versus small disjuncts

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Boosted Classification Trees and Class Probability/Quantile Estimation

The Journal of Machine Learning Research
Several SVM Ensemble Methods Integrated with Under-Sampling for Imbalanced Data Learning

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
SMOTE: synthetic minority over-sampling technique

Journal of Artificial Intelligence Research
Exploratory undersampling for class-imbalance learning

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Active site prediction using evolutionary and structural information

Bioinformatics
Letters: Ensemble of classifiers for protein fold recognition

Neurocomputing
On Position-Specific Scoring Matrix for Protein Function Prediction

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
From classifiers to discriminators: A nearest neighbor rule induced discriminant analysis

Pattern Recognition
MemLoci

Bioinformatics
PSICOV

Bioinformatics
Prediction and analysis of nucleotide-binding residues using sequence and sequence-derived structural descriptors

Bioinformatics
Protein secondary structure prediction using DWKF based on SVR-NSGAII

Neurocomputing

Image-based classification of protein subcellular location patterns in human reproductive tissue by ensemble learning global and local features

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

Correctly localizing the protein-ATP binding residues is valuable for both basic experimental biology and drug discovery studies. Protein-ATP binding residues prediction is a typical imbalanced learning problem as the size of minority class (binding residues) is far less than that of majority class (non-binding residues) in the entire sequence. Directly applying the traditional machine learning approach for this task is not suitable as the learning results will be severely biased towards the majority class. To circumvent this problem, a modified AdaBoost ensemble scheme based on random under-sampling is developed. In addition, effectiveness of different features for protein-ATP binding residues prediction is systematically analyzed and a method for objectively reporting evaluation results under the imbalanced learning scenario is also discussed. Experimental results on three benchmark datasets show that the proposed method achieves higher prediction accuracy. The proposed method, called TargetATP, has been implemented with Java programming language and is distributed via Java Web Start technology. TargetATP and the datasets used are freely available at http://www.csbio.sjtu.edu.cn/bioinf/targetATP/ for academicuse.