Efficient computations of gapped string kernels based on suffix kernel

  • Authors:
  • Chuanhuan Yin;Shengfeng Tian;Shaomin Mu;Chong Shao

  • Affiliations:
  • School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, PR China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, PR China;School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, PR China;Department of Information Security Engineering, Chinese People's Public Security University, Beijing 102614, PR China

  • Venue:
  • Neurocomputing
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

In recent years, several approaches to computing the value of p-length gap-weighted kernel have been presented, such as trie-based approach, suffix kernel, and range sum technique. Although other approaches can achieve better performance in some cases, suffix kernel technique is still an efficient approach in this context. In this paper, we present a series of dynamic programming algorithms based on suffix kernel to compute gapped string kernels. Given strings s and t, and a gap penalty @l, all-length gap-weighted kernel can be calculated in time O(|s||t|) with our algorithms. Moreover, some new string kernels belonging to the family of gapped string kernels are presented, including all-length and p-length match-weighted kernels, and their variants. Based on the suffix kernel technique, we can compute all-length match-weighted kernel in time O(|s||t|), and then p-length kernel in time O(p|s||t|) using the relationship between all-length and p-length kernels. Furthermore, for p-length match-weighted kernel and its variant, a bit-parallel technique is used to reduce the complexity from O(p|s||t|) to O(@?pk/w@?|s||t|), where w is the word size of the machine (e.g. 32 or 64 in practice) and k is determined by the longest matching subsequence of two strings s and t. The empirical results suggest that the suffix kernel technique is an important and useful approach to computing gapped string kernels.