A faster algorithm for matching a set of patterns with variable length don't cares

Authors:
Meng Zhang;Yi Zhang;Liang Hu
Affiliations:
College of Computer Science and Technology, Jilin University, Changchun, China;Department of Computer Science, Jilin Business and Technology College, Changchun, China and College of Computer Science and Technology, Jilin University, Changchun, China;College of Computer Science and Technology, Jilin University, Changchun, China
Venue:
Information Processing Letters
Year:
2010

Citing 9
Cited 2

Transducers and repetitions

Theoretical Computer Science
Complete inverted files for efficient text retrieval and analysis

Journal of the ACM (JACM)
A data structure for dynamic trees

Journal of Computer and System Sciences
An algorithm for string matching with a sequence of don't cares

Information Processing Letters
Improved dynamic dictionary matching

Information and Computation
Matching a set of strings with variable length don't cares

Theoretical Computer Science
Efficient string matching: an aid to bibliographic search

Communications of the ACM
Discovering Best Variable-Length-Don't-Care Patterns

DS '02 Proceedings of the 5th International Conference on Discovery Science
Compressed indexes for dynamic text collections

ACM Transactions on Algorithms (TALG)

Online dictionary matching with variable-length gaps

SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Finding overlaps within regular expressions with variable-length gaps

Proceedings of the 2013 Research in Adaptive and Convergent Systems

Quantified Score

Hi-index	0.89

Visualization

Abstract

We present a simple and faster solution to the problem of matching a set of patterns with variable length don't cares. Given an alphabet @S, a pattern p is a word p"1@p"2...@p"m, where p"i is a string over @S called a keyword and @@?@S is a symbol called a variable length don't care (VLDC) symbol. Pattern p matches a text t if t=u"0p"1u"1...u"m"-"1p"mu"m for some u"0,...,u"m@?@S^*. The problem addressed in this paper is: given a set of patterns P and a text t, determine whether one of the patterns of P matches t. Kucherov and Rusinowitch (1997) [9] presented an algorithm that solves the problem in time O((|t|+|P|)log|P|), where |P| is the total length of keywords in every pattern of P. We give a new algorithm based on Aho-Corasick automaton. It uses the solutions of Dynamic Marked Ancestor Problem of Chan et al. (2007) [5]. The algorithm takes O((|t|+@?P@?)log@k/loglog@k) time, where @?P@? is the total number of keywords in every pattern of P, and @k is the number of distinct keywords in P. The algorithm is faster and simpler than the previous approach.