Approximate Matching of Run-Length Compressed Strings

Authors:
Veli Mäkinen;Gonzalo Navarro;Esko Ukkonen
Affiliations:
-;-;-
Venue:
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Year:
2001

Citing 11
Cited 6

Algorithms for approximate string matching

Information and Control
An improved algorithm for computing the edit distance of run-length coded strings

Information Processing Letters
Let sleeping files lie: pattern matching in Z-compressed files

Journal of Computer and System Sciences
Matching for run-length encoded strings

Journal of Complexity
The String-to-String Correction Problem

Journal of the ACM (JACM)
Fast Two-Dimensional Approximate Pattern Matching

LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
Approximate String Matching over Ziv-Lempel Compressed Text

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
A Unifying Framework for Compressed Pattern Matching

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Bit-Parallel Approach to Approximate String Matching in Compressed Texts

SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Multiple Pattern Matching in LZW Compressed Text

DCC '98 Proceedings of the Conference on Data Compression
Faster Approximate String Matching over Compressed Text

DCC '01 Proceedings of the Data Compression Conference

Edit distance of run-length encoded strings

Information Processing Letters
The SBC-tree: an index for run-length compressed sequences

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Computing similarity of run-length encoded strings with affine gap penalty

Theoretical Computer Science
Random access to grammar-compressed strings

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Computing similarity of run-length encoded strings with affine gap penalty

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Speeding up HMM decoding and training by exploiting sequence repetitions

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

We focus on the problem of approximate matching of strings that have been compressed using run-length encoding. Previous studies have concentrated on the problem of computing the longest common subsequence (LCS) between two strings of length m and n, compressed to m驴 and n驴 runs. We extend an existing algorithm for the LCS to the Levenshtein distance achieving O(m驴 n+n驴 m) complexity. This approach gives also an algorithm for approximate searching of a pattern of m letters (m驴 runs) in a text of n letters (n驴 runs) in O(mm驴 n驴) time, both for LCS and Levenshtein models. Then we propose improvements for a greedy algorithm for the LCS, and conjecture that the improved algorithm has O(m驴 n驴) expected case complexity. Experimental results are provided to support the conjecture.