Exploiting sequence dependencies in the prediction of peroxisomal proteins

Authors:
Mark Wakabayashi;John Hawkins;Stefan Maetschke;Mikael Bodén
Affiliations:
ARC Centre for Complex Systems;School of Information Technology and Electrical Engineering, The University of Queensland, Australia;School of Information Technology and Electrical Engineering, The University of Queensland, Australia;School of Information Technology and Electrical Engineering, The University of Queensland, Australia
Venue:
IDEAL'05 Proceedings of the 6th international conference on Intelligent Data Engineering and Automated Learning
Year:
2005

Citing 2
Cited 0

Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Prediction of subcellular localization using sequence-biased recurrent networks

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Prediction of peroxisomal matrix proteins generally depends on the presence of one of two distinct motifs at the end of the amino acid sequence. PTS1 peroxisomal proteins have a well conserved tripeptide at the C-terminal end. However, the preceding residues in the sequence arguably play a crucial role in targeting the protein to the peroxisome. Previous work in applying machine learning to the prediction of peroxisomal matrix proteins has failed to capitalize on the full extent of these dependencies. We benchmark a range of machine learning algorithms, and show that a classifier – based on the Support Vector Machine – produces more accurate results when dependencies between the conserved motif and the preceding section are exploited. We publish an updated and rigorously curated data set that results in increased prediction accuracy of most tested models.