Indexed multi-pattern matching

  • Authors:
  • Travis Gagie;Kalle Karhu;Juha Kärkkäinen;Veli Mäkinen;Leena Salmela;Jorma Tarhio

  • Affiliations:
  • Department of Computer Science and Engineering, Aalto University, Finland;Department of Computer Science and Engineering, Aalto University, Finland;Department of Computer Science, University of Helsinki, Finland;Department of Computer Science, University of Helsinki, Finland;Department of Computer Science, University of Helsinki, Finland;Department of Computer Science and Engineering, Aalto University, Finland

  • Venue:
  • LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

If we want to search sequentially for occurrences of many patterns in a given text, then we can apply any of dozens of multi-pattern matching algorithms in the literature. As far as we know, however, no one has said what to do if we are given a compressed self-index for the text instead of the text itself. In this paper we show how to take advantage of similarities between the patterns to speed up searches in an index. For example, we show how to store a string S [1..n] in nHk (S)+o (n (Hk (S)+1)) bits such that, given the LZ77 parse of the concatenation of t patterns of total length ℓ and maximum individual length m, we can count the occurrences of each pattern in a total of O((z + t) log ℓ log m log1 + ε n) time, where z is the number of phrases in the parse.