Exploiting sequence dependencies in the prediction of peroxisomal proteins

  • Authors:
  • Mark Wakabayashi;John Hawkins;Stefan Maetschke;Mikael Bodén

  • Affiliations:
  • ARC Centre for Complex Systems;School of Information Technology and Electrical Engineering, The University of Queensland, Australia;School of Information Technology and Electrical Engineering, The University of Queensland, Australia;School of Information Technology and Electrical Engineering, The University of Queensland, Australia

  • Venue:
  • IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Prediction of peroxisomal matrix proteins generally depends on the presence of one of two distinct motifs at the end of the amino acid sequence. PTS1 peroxisomal proteins have a well conserved tripeptide at the C-terminal end. However, the preceding residues in the sequence arguably play a crucial role in targeting the protein to the peroxisome. Previous work in applying machine learning to the prediction of peroxisomal matrix proteins has failed to capitalize on the full extent of these dependencies. We benchmark a range of machine learning algorithms, and show that a classifier – based on the Support Vector Machine – produces more accurate results when dependencies between the conserved motif and the preceding section are exploited. We publish an updated and rigorously curated data set that results in increased prediction accuracy of most tested models.