A faster algorithm for matching a set of patterns with variable length don't cares

  • Authors:
  • Meng Zhang;Yi Zhang;Liang Hu

  • Affiliations:
  • College of Computer Science and Technology, Jilin University, Changchun, China;Department of Computer Science, Jilin Business and Technology College, Changchun, China and College of Computer Science and Technology, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China

  • Venue:
  • Information Processing Letters
  • Year:
  • 2010

Quantified Score

Hi-index 0.89

Visualization

Abstract

We present a simple and faster solution to the problem of matching a set of patterns with variable length don't cares. Given an alphabet @S, a pattern p is a word p"1@p"2...@p"m, where p"i is a string over @S called a keyword and @@?@S is a symbol called a variable length don't care (VLDC) symbol. Pattern p matches a text t if t=u"0p"1u"1...u"m"-"1p"mu"m for some u"0,...,u"m@?@S^*. The problem addressed in this paper is: given a set of patterns P and a text t, determine whether one of the patterns of P matches t. Kucherov and Rusinowitch (1997) [9] presented an algorithm that solves the problem in time O((|t|+|P|)log|P|), where |P| is the total length of keywords in every pattern of P. We give a new algorithm based on Aho-Corasick automaton. It uses the solutions of Dynamic Marked Ancestor Problem of Chan et al. (2007) [5]. The algorithm takes O((|t|+@?P@?)log@k/loglog@k) time, where @?P@? is the total number of keywords in every pattern of P, and @k is the number of distinct keywords in P. The algorithm is faster and simpler than the previous approach.