The nature of statistical learning theory
The nature of statistical learning theory
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Text classification using string kernels
The Journal of Machine Learning Research
Fast String Kernels using Inexact Matching for Protein Sequences
The Journal of Machine Learning Research
Intrusion detection using sequences of system calls
Journal of Computer Security
A composite kernel for named entity recognition
Pattern Recognition Letters
Hi-index | 0.10 |
Various sequence-similarity kernels, the string kernels, have been introduced for use with support vector machines (SVMs) in a discriminative approach to the sequence data classification problems. In these applications, string kernels are asked to be similarity measures between strings. In this paper, we present a new string kernel and its variants suitable to sequence data classification, which are determined by (possibly non-contiguous) matching subsequences with all possible lengths shared by two strings. In these kernels, gaps in subsequences are allowed and the longer subsequences contribute more to the value of kernels. Efficient algorithms of computing the kernels are derived with the techniques of dynamic programming and bit-parallelism. In some cases, the computation of the kernel is linear in the length of the strings.