Accuracy of string kernels for protein sequence classification

  • Authors:
  • J. Dylan Spalding;David C. Hoyle

  • Affiliations:
  • School of Engineering, Computer Science & Mathematics, University of Exeter, Exeter, UK;School of Engineering, Computer Science & Mathematics, University of Exeter, Exeter, UK

  • Venue:
  • ICAPR'05 Proceedings of the Third international conference on Advances in Pattern Recognition - Volume Part I
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Determining protein sequence similarity is an important task for protein classification and homology detection. Typically this may be done using sequence alignment algorithms, yet fast and accurate alignment-free kernel based classifiers exist. Viewing sequences as a “bag of words”, we test a simple weighted string kernel, investigating the effects of k-mer length, sequence length and choice of weighting. We also extend the kernel to operate on the k-mer frequency representation of a sequence rather than the “bag of words” representation.