Bit-Parallel Approach to Approximate String Matching in Compressed Texts

Authors:
T. Matsumoto;T. Kida;M. Takeda;A. Shinohara;S. Arikawa
Affiliations:
-;-;-;-;-
Venue:
SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
Year:
2000

Citing 0
Cited 10

Approximate Matching of Run-Length Compressed Strings

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Multiple Pattern Matching Algorithms on Collage System

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Regular Expression Searching over Ziv-Lempel Compressed Text

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Collage system: a unifying framework for compressed pattern matching

Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
Faster Approximate String Matching over Compressed Text

DCC '01 Proceedings of the Data Compression Conference
Compressed Pattern Matching for Sequitur

DCC '01 Proceedings of the Data Compression Conference
Regular expression searching on compressed text

Journal of Discrete Algorithms
Approximate string matching on Ziv-Lempel compressed text

Journal of Discrete Algorithms
Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts

ACM Transactions on Algorithms (TALG)
Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

Addresses the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of a collage system, which is a formal system proposed by T. Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word, e.g. (m-k)(k+1)/spl les/L, where m is the pattern length, k is the number of allowed errors and L is the length, in bits, of the machine word. For a class of simple collage systems, the algorithm runs in O(k/sup 2/(/spl par//spl Dscr//spl par/+|/spl Sscr/|)+km) time using O(k/sup 2//spl par//spl Dscr//spl par/) space, where /spl par//spl Dscr//spl par/ is the size of dictionary /spl Dscr/ and |/spl Sscr/| is the number of tokens in the sequence /spl Sscr/. The LZ78 (Lempel-Ziv, 1978) and the LZW (Lempel-Ziv-Welch, 1984) compression methods are covered by this class. Since we can regard n=/spl par//spl Dscr//spl par/+|/spl Sscr/| as the compressed length, the time and space complexities are O(k/sup 2/n+km) and O(k/sup 2/n), respectively. For general k and m, they become O(k/sup 3/mn/L+km) and O(k/sup 3/mn/L). Thus, our algorithm is competitive to the algorithm proposed by J. Ka/spl uml/rkka/spl uml/inen, et al. (2000), which runs in O(km) time using O(kmn) space, when k=O(/spl radic/L).