Approximate Matching of Run-Length Compressed Strings

  • Authors:
  • Veli Mäkinen;Gonzalo Navarro;Esko Ukkonen

  • Affiliations:
  • -;-;-

  • Venue:
  • CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n, compressed to m驴 and n驴 runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m驴 n+n驴 m) complexity. This approach gives also an algorithm for approximate searching of a pattern of m letters (m驴 runs) in a text of n letters (n驴 runs) in O(mm驴 n驴) time, both for LCS and Levenshtein models. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m驴 n驴) expected case complexity. Experimental results are provided to support the conjecture.