Prediction of disulfide bonding pattern based on a support vector machine and multiple trajectory search

  • Authors:
  • Hsuan-Hung Lin;Lin-Yu Tseng

  • Affiliations:
  • Department of Applied Mathematics, National Chung Hsing University, 250, Kuo Kuang Road, Taichung 402, Taiwan, ROC and Department of Management Information System, Central Taiwan University of Sci ...;Department of Computer Science and Communication Engineering, Providence University, 200, Chung Chi Road, Taichung 433, Taiwan, ROC and Department of Computer Science and Engineering, National Chu ...

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2012

Quantified Score

Hi-index 0.07

Visualization

Abstract

To determine protein folding, accurately predicting the connectivity pattern of disulfide bridges can significantly reduce the search space, helping to solving the protein-folding problem. Therefore, developing an effective means of predicting disulfide connectivity patterns facilitates the estimation of the three-dimensional structure of a protein and its function. To our knowledge, with the prior knowledge of the bonding states of cysteines, the highest accuracy rate in the literature for predicting the overall disulfide connectivity pattern (Q"p) is 74.4% for dataset SP39. Dataset SP39 is conventionally adopted to predict disulfide connectivity. This work presents a novel classifier based on the support vector machine (SVM) that incorporates features of position-specific scoring matrix (PSSM), normalized bond lengths, the predicted secondary structure of protein, and indices for the physicochemical properties of amino acid. The support vector machine is trained to derive the connectivity probabilities of cysteine pairs. Additionally, an evolutionary algorithm called the multiple trajectory search (MTS) is integrated with the SVM model to tune the SVM parameters and window sizes for the above features. Moreover, the disulfide connectivity pattern is identified by using the maximum weight perfect matching algorithm. Experimental results indicate that the accuracy rate for predicting the overall disulfide connectivity pattern (Q"p) reaches 79.8% when tested using the same dataset SP39.