Speeding Up Pattern Matching by Text Compression

  • Authors:
  • Yusuke Shibata;Takuya Kida;Shuichi Fukamachi;Masayuki Takeda;Ayumi Shinohara;Takeshi Shinohara;Setsuo Arikawa

  • Affiliations:
  • -;-;-;-;-;-;-

  • Venue:
  • CIAC '00 Proceedings of the 4th Italian Conference on Algorithms and Complexity
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Byte pair encoding (BPE) is a simple universal text compression scheme. Decompression is very fast and requires small work space. Moreover, it is easy to decompress an arbitrary part of the original text. However, it has not been so popular since the compression is rather slow and the compression ratio is not as good as other methods such as Lempel-Ziv type compression. In this paper, we bring out a potential advantage of BPE compression. We show that it is very suitable from a practical view point of compressed pattern matching, where the goal is to find a pattern directly in compressed text without decompressing it explicitly. We compare running times to find a pattern in (1) BPE compressed files, (2) Lempel-Ziv-Welch compressed files, and (3) original text files, in various situations. Experimental results show that pattern matching in BPE compressed text is even faster than matching in the original text. Thus the BPE compression reduces not only the disk space but also the searching time.