Collage system: a unifying framework for compressed pattern matching

  • Authors:
  • Takuya Kida;Tetsuya Matsumoto;Yusuke Shibata;Masayuki Takeda;Ayumi Shinohara;Setsuo Arikawa

  • Affiliations:
  • Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan;Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan;Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan;Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan and PRESTO, Japan Science and Technology Corporation, Japan;Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan;Department of Informatics, Kyushu University, 33 Fukuoka 812-8581, Japan

  • Venue:
  • Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a general framework which is suitable to capture the essence of compressed pattern matching according to various dictionary-based compressions. It is a formal system to represent a string by a pair of dictionary D and sequence S of phrases in D. The basic operations are concatenation, truncation, and repetition. We also propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family (LZ77, LZSS, LZ78, LZW), RE-PAIR, SEQUITUR, and the static dictionary-based method. The proposed algorithm runs in O((||D|| + |S|)- height(D) + m2 + r) time with O(||D|| + m2) space, where ||D|| is the size of D, |S| is the number of tokens in S, height(D) is the maximum dependency of tokens in D, m is the pattern length, and r is the number of pattern occurrences. For a subclass of the framework that contains no truncation, the time complexity is O(||D|| + |S| + m2 + r).