LZgrep: a Boyer–Moore string matching tool for Ziv–Lempel compressed text: Research Articles

  • Authors:
  • Gonzalo Navarro;Jorma Tarhio

  • Affiliations:
  • Department of Computer Science, University of Chile;Department of Computer Science and Engineering, Helsinki University of Technology, Finland

  • Venue:
  • Software—Practice & Experience
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a Boyer–Moore (BM) approach to string matching over LZ78 and LZW compressed text. The idea is to search the text directly in compressed form instead of decompressing and then searching it. We modify the BM approach so as to skip text using the characters explicitly represented in the LZ78/LZW formats, modifying the basic technique where the algorithm can choose which characters to inspect. We present and compare several solutions for single and multipattern searches. We show that our algorithms obtain speedups of up to 50% compared to the simple decompress-then-search approach. Finally, we present a public tool, LZgrep, which uses our algorithms to offer grep-like capabilities directly searching files compressed using Unix's Compress, a LZW compressor. LZgrep can also search files compressed with Unix gzip, using new decompress-then-search techniques we develop, which are faster than the current tools. This way, users can always keep their files in compressed form and still search them, uncompressing only when they want to see them. Copyright © 2005 John Wiley & Sons, Ltd.