Discovering Best Variable-Length-Don't-Care Patterns

  • Authors:
  • Shunsuke Inenaga;Hideo Bannai;Ayumi Shinohara;Masayuki Takeda;Setsuo Arikawa

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • DS '02 Proceedings of the 5th International Conference on Discovery Science
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

A variable-length-don't-care pattern (VLDC pattern) is an element of set 驴 = (驴驴{*})*, where 驴 is an alphabet and * is a wildcard matching any string in 驴*. Given two sets of strings, we consider the problem of finding the VLDC pattern that is the most common to one, and the least common to the other. We present a practical algorithm to find such best VLDC patterns exactly, powerfully sped up by pruning heuristics. We introduce two versions of our algorithm: one employs a pattern matching machine (PMM) whereas the other does an index structure called the Wildcard Directed Acyclic Word Graph (WDAWG). In addition, we consider a more generalized problem of finding the best pair 驴q, k驴, where k is the window size that specifies the length of an occurrence of the VLDC pattern q matching a string w. We present three algorithms solving this problem with pruning heuristics, using the dynamic programming (DP), PMMs and WDAWGs, respectively. Although the two problems are NP-hard, we experimentally show that our algorithms run remarkably fast.