String Kernels Based on Variable-Length-Don't-Care Patterns

  • Authors:
  • Kazuyuki Narisawa;Hideo Bannai;Kohei Hatano;Shunsuke Inenaga;Masayuki Takeda

  • Affiliations:
  • Department of Informatics, Kyushu University,;Department of Informatics, Kyushu University,;Department of Informatics, Kyushu University,;Graduate School of Information Science and Electrical Engineering, Kyushu University, Fukuoka, Japan 819---0395;Department of Informatics, Kyushu University,

  • Venue:
  • DS '08 Proceedings of the 11th International Conference on Discovery Science
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a new string kernel based on variable-length-don't-care patterns(VLDC patterns). A VLDC pattern is an element of (Σ茂戮驴 { 茂戮驴 })*, where Σis an alphabet and 茂戮驴 is the variable-length-don't-care symbol that matches any string in Σ*. The number of VLDC patterns matching a given string sof length nis O(22n). We present an O(n5 ) algorithm for computing the kernel value. We also propose variations of the kernel which modify the relative weights of each pattern. We evaluate our kernels using a support vector machine to classify spam data.