Theoretical Computer Science
Complete inverted files for efficient text retrieval and analysis
Journal of the ACM (JACM)
A data structure for dynamic trees
Journal of Computer and System Sciences
An algorithm for string matching with a sequence of don't cares
Information Processing Letters
Improved dynamic dictionary matching
Information and Computation
Matching a set of strings with variable length don't cares
Theoretical Computer Science
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Discovering Best Variable-Length-Don't-Care Patterns
DS '02 Proceedings of the 5th International Conference on Discovery Science
Compressed indexes for dynamic text collections
ACM Transactions on Algorithms (TALG)
Online dictionary matching with variable-length gaps
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Finding overlaps within regular expressions with variable-length gaps
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Hi-index | 0.89 |
We present a simple and faster solution to the problem of matching a set of patterns with variable length don't cares. Given an alphabet @S, a pattern p is a word p"1@p"2...@p"m, where p"i is a string over @S called a keyword and @@?@S is a symbol called a variable length don't care (VLDC) symbol. Pattern p matches a text t if t=u"0p"1u"1...u"m"-"1p"mu"m for some u"0,...,u"m@?@S^*. The problem addressed in this paper is: given a set of patterns P and a text t, determine whether one of the patterns of P matches t. Kucherov and Rusinowitch (1997) [9] presented an algorithm that solves the problem in time O((|t|+|P|)log|P|), where |P| is the total length of keywords in every pattern of P. We give a new algorithm based on Aho-Corasick automaton. It uses the solutions of Dynamic Marked Ancestor Problem of Chan et al. (2007) [5]. The algorithm takes O((|t|+@?P@?)log@k/loglog@k) time, where @?P@? is the total number of keywords in every pattern of P, and @k is the number of distinct keywords in P. The algorithm is faster and simpler than the previous approach.