Indexed multi-pattern matching

Authors:
Travis Gagie;Kalle Karhu;Juha Kärkkäinen;Veli Mäkinen;Leena Salmela;Jorma Tarhio
Affiliations:
Department of Computer Science and Engineering, Aalto University, Finland;Department of Computer Science and Engineering, Aalto University, Finland;Department of Computer Science, University of Helsinki, Finland;Department of Computer Science, University of Helsinki, Finland;Department of Computer Science, University of Helsinki, Finland;Department of Computer Science and Engineering, Aalto University, Finland
Venue:
LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Year:
2012

Citing 9
Cited 2

Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Efficient string matching: an aid to bibliographic search

Communications of the ACM
An Improved Pattern Matching Algorithm for Strings in Terms of Straight-Line Programs

CPM '97 Proceedings of the 8th Annual Symposium on Combinatorial Pattern Matching
Application of Lempel--Ziv factorization to the approximation of grammar-based compression

Theoretical Computer Science
New text indexing functionalities of the compressed suffix arrays

Journal of Algorithms
Approximate string matching using compressed suffix arrays

Theoretical Computer Science
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Dynamic text and static pattern matching

ACM Transactions on Algorithms (TALG)
Processing compressed texts: a tractability border

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

Multi-pattern matching with bidirectional indexes

Journal of Discrete Algorithms
Efficient indexing techniques for record matching and deduplication

International Journal of Computational Vision and Robotics

Quantified Score

Hi-index	0.00

Visualization

Abstract

If we want to search sequentially for occurrences of many patterns in a given text, then we can apply any of dozens of multi-pattern matching algorithms in the literature. As far as we know, however, no one has said what to do if we are given a compressed self-index for the text instead of the text itself. In this paper we show how to take advantage of similarities between the patterns to speed up searches in an index. For example, we show how to store a string S [1..n] in nHk (S)+o (n (Hk (S)+1)) bits such that, given the LZ77 parse of the concatenation of t patterns of total length ℓ and maximum individual length m, we can count the occurrences of each pattern in a total of O((z + t) log ℓ log m log1 + ε n) time, where z is the number of phrases in the parse.