Faster String Matching with Super-Alphabets

Authors:
Kimmo Fredriksson
Affiliations:
-
Venue:
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Year:
2002

Citing 11
Cited 15

Improved string searching

Software—Practice & Experience
A new approach to text searching

Communications of the ACM
Fast text searching: allowing errors

Communications of the ACM
String matching in the DNA alphabet

Software—Practice & Experience
Fast and flexible word searching on compressed text

ACM Transactions on Information Systems (TOIS)
A fast string searching algorithm

Communications of the ACM
Efficient string matching: an aid to bibliographic search

Communications of the ACM
String Searching Algorithms Revisited

WADS '89 Proceedings of the Workshop on Algorithms and Data Structures
Boyer-Moore String Matching over Ziv-Lempel Compressed Text

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Approximate String Matching and Local Similarity

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
A Bit-Parallel Approach to Suffix Automata: Fast Extended String Matching

CPM '98 Proceedings of the 9th Annual Symposium on Combinatorial Pattern Matching

Shift-or string matching with super-alphabets

Information Processing Letters
Multipattern string matching with q-grams

Journal of Experimental Algorithmics (JEA)
Efficient String Matching in Huffman Compressed Texts

Fundamenta Informaticae
Fast BWT in small space by blockwise suffix sorting

Theoretical Computer Science
Fast Searching in Packed Strings

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Accelerating Boyer-Moore searches on binary texts

Theoretical Computer Science
Tuning string matching for huge pattern sets

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Accelerating Boyer Moore searches on binary texts

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Fast searching in packed strings

Journal of Discrete Algorithms
Worst case efficient single and multiple string matching in the RAM model

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Worst-case efficient single and multiple string matching on packed texts in the word-RAM model

Journal of Discrete Algorithms
Constant-Time word-size string matching

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Efficient String Matching in Huffman Compressed Texts

Fundamenta Informaticae
A simple pattern matching algorithm for weighted sequences

Proceedings of the 2012 ACM Research in Applied Computation Symposium
Approximate pattern matching with k-mismatches in packed text

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a text T[1 . . . n] and a pattern P[1 . . . m] over some alphabet 驴 of size 驴, finding the exact occurrences of P in T requires at least 驴(n log驴 m/m) character comparisons on average, as shown in [19]. Consequently, it is believed that this lower bound implies also an 驴(n log驴 m/m) lower bound for the execution time of an optimal algorithm. However, in this paper we show how to obtain an O(n/m) average time algorithm. This is achieved by slightly changing the model of computation, and with a modification of an existing algorithm. Our technique uses a super-alphabet for simulating suffix automaton. The space usage of the algorithm is O(驴m). The technique can be applied to many other string matching algorithms, including dictionary matching, which is also solved in expected time O(n/m), and approximate matching allowing k edit operations (mismatches, insertions or deletions of characters). This is solved in expected time O(nk/m) for k 驴 O(m/log驴 m). The known lower bound for this problem is 驴(n(k + log驴 m)/m), given in [6]. Finally we show how to adopt a similar technique to the shift-or algorithm, extending its bit-parallelism in another direction. This gives a speed-up by a factor s, where s is the number of characters processed simultaneously. Some of the algorithms are implemented, and we show that the methods work well in practice too. This is especially true for the shift-or algorithm, which in some cases works faster than predicted by the theory. The result is the fastest known algorithm for exact string matching for short patterns and small alphabets. All the methods and analyses assume the RAM model of computation, and that each symbol is coded in b = 驴log2 驴驴 bits. They work for larger b too, but the speed-up is decreased.