The gapped spectrum kernel for support vector machines

  • Authors:
  • Taku Onodera;Tetsuo Shibuya

  • Affiliations:
  • Human Genome Center, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo, Japan;Human Genome Center, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo, Japan

  • Venue:
  • MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of classifying string data faster and more accurately. This problem naturally arises in various fields that involve the analysis of huge amount of strings such as computational biology. Our solution, a new string kernel we call gapped spectrum kernel, yields a kind of sequence of kernels that interpolates faster and less accurate string kernels such as the spectrum kernel and slower and more accurate ones such as the wildcard kernel. As a result, we obtain an algorithm to compute the wildcard kernel that is provably faster than the state-of-the-art method. The recently introduced b-suffix array data structure plays an important role here. Another result is a better trade-off between the speed and accuracy of classification, which we demonstrate by protein classification experiment.