A Boyer-Moore Type Algorithm for Compressed Pattern Matching

  • Authors:
  • Yusuke Shibata;Tetsuya Matsumoto;Masayuki Takeda;Ayumi Shinohara;Setsuo Arikawa

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

We apply the Boyer-Moore technique to compressed pattern matching for text string described in terms of collage system, which is a formal framework that captures various dictionary-based compression methods. For a subclass of collage systems that contain no truncation, our new algorithm runs in O(∥D∥ + n ċ m + m2 + r) time using O(∥D∥ + m2) space, where ∥D∥ is the size of dictionary D, n is the compressed text length, m is the pattern length, and r is the number of pattern occurrences. For a general collage system, the time complexity is O(height(D)ċ(∥D∥+n)+nċm+m2+r), where height(D) is the maximum dependency of tokens in D. We showed that the algorithm specialized for the so-called byte pair encoding (BPE) is very fast in practice. In fact it runs about 1.2 - 3.0 times faster than the exact match routine of the software package agrep, known as the fastest pattern matching tool.