Approximate string matching on Ziv-Lempel compressed text

  • Authors:
  • Juha Kärkkäinen;Gonzalo Navarro;Esko Ukkonen

  • Affiliations:
  • Department of Computer Science, University of Helsinki, Finland;Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile;Department of Computer Science, University of Helsinki, Finland

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present the first nontrivial algorithm for approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions. On LZ78/LZW we need O(mkn + R) time in the worst case and O(k2n + mk min(n, (mσ)k) + R) on average where σ is the alphabet size. The experimental results show a practical speedup over the basic approach of up to 2X for moderate m and small k. We extend the algorithms to more general compression formats and approximate matching models.