Comparing SVM sequence kernels: a protein subcellular localization theme

Authors:
Lynne Davis;John Hawkins;Stefan Maetschke;Mikael Bodén
Affiliations:
The University of Queensland, QLD, Australia;The University of Queensland, QLD, Australia;The University of Queensland, QLD, Australia;The University of Queensland, QLD, Australia
Venue:
WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
Year:
2006

Citing 7
Cited 0

Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Kernel principal component analysis

Advances in kernel methods
Fast String Kernels using Inexact Matching for Protein Sequences

The Journal of Machine Learning Research
Mismatch string kernels for discriminative protein classification

Bioinformatics
Protein homology detection using string alignment kernels

Bioinformatics
Profile-based direct kernels for remote homology detection and fold recognition

Bioinformatics
Improved prediction of bacterial transcription start sites

Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Kernel-based machine learning algorithms are versatile tools for biological sequence data analysis. Special sequence kernels can endow Support Vector Machines with biological knowledge to perform accurate classification of diverse sequence data. The kernels relative strengths and weaknesses are difficult to evaluate on single data sets. We examine a range of recent kernels tailor-made for biological sequence data (including the Spectrum, Mismatch, Wildcard, Substitution, Local Alignment and a new Profile-based Local Alignment kernel) on a range of classification problems (protein localization in bacteria, peroxisomal protein import signals and sub-nuclear localization). The profile-based local alignment kernel ranks highest, but its computational cost is also higher than for any of the other kernels in contention. The kernels that consistently perform well and tend to produce the most distinct classifications are the Local Alignment, Substitution and Mismatch kernels, suggesting that the exploration of new problem sets should start with these three.