Length-weighted string kernels for sequence data classification

Authors:
Shengfeng Tian;Shaomin Mu;Chuanhuan Yin
Affiliations:
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, PR China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, PR China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, PR China
Venue:
Pattern Recognition Letters
Year:
2007

Citing 5
Cited 1

The nature of statistical learning theory

The nature of statistical learning theory
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Text classification using string kernels

The Journal of Machine Learning Research
Fast String Kernels using Inexact Matching for Protein Sequences

The Journal of Machine Learning Research
Intrusion detection using sequences of system calls

Journal of Computer Security

A composite kernel for named entity recognition

Pattern Recognition Letters

Quantified Score

Hi-index	0.10

Visualization

Abstract

Various sequence-similarity kernels, the string kernels, have been introduced for use with support vector machines (SVMs) in a discriminative approach to the sequence data classification problems. In these applications, string kernels are asked to be similarity measures between strings. In this paper, we present a new string kernel and its variants suitable to sequence data classification, which are determined by (possibly non-contiguous) matching subsequences with all possible lengths shared by two strings. In these kernels, gaps in subsequences are allowed and the longer subsequences contribute more to the value of kernels. Efficient algorithms of computing the kernels are derived with the techniques of dynamic programming and bit-parallelism. In some cases, the computation of the kernel is linear in the length of the strings.