The gapped spectrum kernel for support vector machines

Authors:
Taku Onodera;Tetsuo Shibuya
Affiliations:
Human Genome Center, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo, Japan;Human Genome Center, Institute of Medical Science, University of Tokyo, Minato-ku, Tokyo, Japan
Venue:
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2013

Citing 14
Cited 0

Suffix arrays: a new method for on-line string searches

SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Combining pairwise sequence similarity and support vector machines for remote protein homology detection

Proceedings of the sixth annual international conference on Computational biology
Using the Fisher Kernel Method to Detect Remote Protein Homologies

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Optimal suffix tree construction with large alphabets

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Text classification using string kernels

The Journal of Machine Learning Research
Profile-Based String Kernels for Remote Homology Detection and Motif Extraction

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Mismatch string kernels for discriminative protein classification

Bioinformatics
Kernel methods for predicting protein--protein interactions

Bioinformatics
Predicting the in vivo signature of human gene regulatory sequences

Bioinformatics
Kernels for small molecules and the prediction of mutagenicity, toxicity and anti-cancer activity

Bioinformatics
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
An index structure for spaced seed search

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of classifying string data faster and more accurately. This problem naturally arises in various fields that involve the analysis of huge amount of strings such as computational biology. Our solution, a new string kernel we call gapped spectrum kernel, yields a kind of sequence of kernels that interpolates faster and less accurate string kernels such as the spectrum kernel and slower and more accurate ones such as the wildcard kernel. As a result, we obtain an algorithm to compute the wildcard kernel that is provably faster than the state-of-the-art method. The recently introduced b-suffix array data structure plays an important role here. Another result is a better trade-off between the speed and accuracy of classification, which we demonstrate by protein classification experiment.