Improving protein-ATP binding residues prediction by boosting SVMs with random under-sampling

  • Authors:
  • Dong-Jun Yu;Jun Hu;Zhen-Min Tang;Hong-Bin Shen;Jian Yang;Jing-Yu Yang

  • Affiliations:
  • School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China and Changshu Institute, Nanjing University of Science and Technology, Changshu 21 ...;School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China;School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China;Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, PR China;School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China;School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China

  • Venue:
  • Neurocomputing
  • Year:
  • 2013

Quantified Score

Hi-index 0.01

Visualization

Abstract

Correctly localizing the protein-ATP binding residues is valuable for both basic experimental biology and drug discovery studies. Protein-ATP binding residues prediction is a typical imbalanced learning problem as the size of minority class (binding residues) is far less than that of majority class (non-binding residues) in the entire sequence. Directly applying the traditional machine learning approach for this task is not suitable as the learning results will be severely biased towards the majority class. To circumvent this problem, a modified AdaBoost ensemble scheme based on random under-sampling is developed. In addition, effectiveness of different features for protein-ATP binding residues prediction is systematically analyzed and a method for objectively reporting evaluation results under the imbalanced learning scenario is also discussed. Experimental results on three benchmark datasets show that the proposed method achieves higher prediction accuracy. The proposed method, called TargetATP, has been implemented with Java programming language and is distributed via Java Web Start technology. TargetATP and the datasets used are freely available at http://www.csbio.sjtu.edu.cn/bioinf/targetATP/ for academicuse.