Regular expression searching on compressed text

  • Authors:
  • Gonzalo Navarro

  • Affiliations:
  • Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a solution to the problem of regular expression searching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text in O(2m + mn + Rm log m) worst case time. On average this drops to O(m2 + (n + Rm) log m) or O(m2 + n + Ru/n) for most regular expressions. This is the first nontrivial result for this problem. The experimental results show that our compressed search algorithm needs half the time necessary for decompression plus searching, which is currently the only alternative.