Complete inverted files for efficient text retrieval and analysis
Journal of the ACM (JACM)
A new approach to text searching
Communications of the ACM
Fast text searching: allowing errors
Communications of the ACM
Text algorithms
Efficient string matching: an aid to bibliographic search
Communications of the ACM
Fast and flexible string matching by combining bit-parallelism and suffix automata
Journal of Experimental Algorithmics (JEA)
Introduction to Automata Theory, Languages and Computability
Introduction to Automata Theory, Languages and Computability
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
Average complexity of exact and approximate multiple string matching
Theoretical Computer Science
Fast multiple string matching using streaming SIMD extensions technology
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Journal of Network and Computer Applications
Hi-index | 0.00 |
In this paper we present a method to simulate, using the bit-parallelism technique, the nondeterministic Aho-Corasick automaton and the nondeterministic suffix automaton induced by the trie and by the Directed Acyclic Word Graph for a set of patterns, respectively. When the prefix redundancy is nonnegligible, this method yields-if compared to the original bit-parallel encoding with no prefix factorization-a representation that requires smaller bit-vectors and, correspondingly, less words. In particular, if we restrict to single-word bit-vectors, more patterns can be packed into a word. We also present two simple algorithms, based on such a technique, for searching a set P of patterns in a text T of length n over an alphabet @S of size @s. Our algorithms, named Log-And and Backward-Log-And, require O((m+@s)@?m/w@?)-space, and work in O(n@?m/w@?) and O(n@?m/w@?l"m"i"n) worst-case searching time, respectively, where w is the number of bits in a computer word, m is the number of states of the automaton, and l"m"i"n is the length of the shortest pattern in P.