Computational Protocol for Screening GPI-anchored Proteins

  • Authors:
  • Wei Cao;Kazuya Sumikoshi;Tohru Terada;Shugo Nakamura;Katsuhiko Kitamoto;Kentaro Shimizu

  • Affiliations:
  • Department of Biotechnology, Graduate School of Agricultural and Life Sciences, University of Tokyo, Tokyo, Japan 113-8657;Department of Biotechnology, Graduate School of Agricultural and Life Sciences, University of Tokyo, Tokyo, Japan 113-8657;Professional Programme for Agricultural Bioinformatics, Graduate School of Agricultural and Life Sciences, University of Tokyo, Tokyo, Japan 113-8657;Department of Biotechnology, Graduate School of Agricultural and Life Sciences, University of Tokyo, Tokyo, Japan 113-8657;Department of Biotechnology, Graduate School of Agricultural and Life Sciences, University of Tokyo, Tokyo, Japan 113-8657;Department of Biotechnology, Graduate School of Agricultural and Life Sciences, University of Tokyo, Tokyo, Japan 113-8657

  • Venue:
  • BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Glycosylphosphatidylinositol (GPI) lipid modification is an important protein posttranslational modification found in many organisms, and GPI-anchoring is confined to the C-terminus of the target protein. We have developed a novel computational protocol for identifying GPI-anchored proteins, which is more accurate than previously proposed protocols. It uses an optimized support vector machine (SVM) classifier to recognize the C-terminal sequence pattern and uses a voting system based on SignalP version 3.0 to determine the presence or absence of the N-terminal signal of a typical GPI-anchored protein. The SVM classifier shows an accuracy of 96%, and the area under the receiver operating characteristic (ROC) curve is 0.97 under a 5-fold cross-validation test. Fourteen of 15 proteins in our sensitivity test dataset and 19 of the 20 proteins experimentally identified by Hamada et al. that were not included in the training dataset were identified correctly. This suggests that our protocol is considerably effective on unseen data. A proteome-wide survey applying the protocol to S. cerevisiae identified 88 proteins as putative GPI-anchored proteins.