Prediction of the bonding state of cysteine residues in proteins with machine-learning methods

  • Authors:
  • Castrense Savojardo;Piero Fariselli;Pier Luigi Martelli;Priyank Shukla;Rita Casadio

  • Affiliations:
  • Biocomputing Group and Department of Computer Science, University of Bologna, Bologna, Italy;Biocomputing Group and Department of Computer Science, University of Bologna, Bologna, Italy;Biocomputing Group University of Bologna, Bologna, Italy;Biocomputing Group and Department of Computer Science, University of Bologna, Bologna, Italy;Biocomputing Group University of Bologna, Bologna, Italy

  • Venue:
  • CIBB'10 Proceedings of the 7th international conference on Computational intelligence methods for bioinformatics and biostatistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we evaluate the performance of machine learning methods in the task of predicting the bonding state of cysteines starting from protein sequences. This task is the first step for the identification of disulfide bonds in proteins. We score the performance of three different approaches: 1) Hidden Support Vector Machines (HSVMs) which integrate the SVM predictions with a Hidden Markov Model; 2) SVM-HMMs which discriminatively train models that are isomorphic to a kth-order hidden Markov model; 3) Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) that we recently introduced. We evaluate two different encoding schemes based on sequence profile and position specific scoring matrix (PSSM) as computed with the PSIBLAST program and we show that when the evolutionary information is encoded with PSSM all the methods perform better than with sequence profile. Among the different methods it appears that GRHCRFs perform slightly better than the others achieving a per protein accuracy of 87% with a Matthews correlation coefficient (C) of 0.73. Finally, we investigate the difference between disulfide bonding state predictions in Eukaryotes and Prokaryotes. Our analysis shows that the per-protein accuracy in Prokaryotic proteins is higher than that in Eukaryotes (0.88 vs 0.83). However, given the paucity of bonded cysteines in Prokaryotes as compared to Eukaryotes the Matthews correlation coefficient is drastically reduced (0.48 vs 0.80).