Position-aware string kernels with weighted shifts and a general framework to apply string kernels to other structured data

Authors:
Kilho Shin
Affiliations:
Carnegie Mellon CyLab, Japan
Venue:
IDEAL'07 Proceedings of the 8th international conference on Intelligent data engineering and automated learning
Year:
2007

Citing 2
Cited 2

A survey of kernels for structured data

ACM SIGKDD Explorations Newsletter
RASE: recognition of alternatively spliced exons in C.elegans

Bioinformatics

Polynomial summaries of positive semidefinite kernels

Theoretical Computer Science
A generalization of Haussler's convolution kernel: mapping kernel and its application to tree kernels

Journal of Computer Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In combination with efficient kernel-base learning machines such as Support Vector Machine (SVM), string kernels have proven to be significantly effective in a wide range of research areas (e.g. bioinformatics, text analysis, voice analysis). Many of the string kernels proposed so far take advantage of simpler kernels such as trivial comparison of characters and/or substrings, and are classified into two classes: the positionaware string kernel which takes advantage of positional information of characters/substrings in their parent strings, and the position-unaware string kernel which does not. Although the positive semidefiniteness of kernels is a critical prerequisite for learning machines to work properly, a little has been known about the positive semidefiniteness of the positionaware string kernel. The present paper is the first paper that presents easily checkable sufficient conditions for the positive semidefiniteness of a certain useful subclass of the position-aware string kernel: the similarity/ matching of pairs of characters/substrings is evaluated with weights determined according to shifts (the differences in the positions of characters/ substrings). Such string kernels have been studied in the literature but insufficiently. In addition, by presenting a general framework for converting positive semidefinite string kernels into those for richer data structures such as trees and graphs, we generalize our results.