A new family of string classifiers based on local relatedness

  • Authors:
  • Yasuto Higa;Shunsuke Inenaga;Hideo Bannai;Masayuki Takeda

  • Affiliations:
  • Department of Informatics, Kyushu University, Japan;Department of Informatics, Kyushu University, Japan;Department of Informatics, Kyushu University, Japan;Department of Informatics, Kyushu University, Japan

  • Venue:
  • DS'06 Proceedings of the 9th international conference on Discovery Science
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a new family of string classifiers based on local relatedness. We use three types of local relatedness measurements, namely, longest common substrings (LCStr's), longest common subsequences (LCSeq's), and window-accumulated longest common subsequences (wLCSeq's). We show that finding the optimal classier for given two sets of strings (the positive set and the negative set), is NP-hard for all of the above measurements. In order to achieve practically efficient algorithms for finding the best classifier, we investigate pruning heuristics and fast string matching techniques based on the properties of the local relatedness measurements.