Approximate Matching of Run-Length Compressed Strings
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Multiple Pattern Matching Algorithms on Collage System
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Regular Expression Searching over Ziv-Lempel Compressed Text
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Collage system: a unifying framework for compressed pattern matching
Theoretical Computer Science - Selected papers in honour of Setsuo Arikawa
Faster Approximate String Matching over Compressed Text
DCC '01 Proceedings of the Data Compression Conference
Compressed Pattern Matching for Sequitur
DCC '01 Proceedings of the Data Compression Conference
Regular expression searching on compressed text
Journal of Discrete Algorithms
Approximate string matching on Ziv-Lempel compressed text
Journal of Discrete Algorithms
Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts
ACM Transactions on Algorithms (TALG)
Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
Addresses the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of a collage system, which is a formal system proposed by T. Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word, e.g. (m-k)(k+1)/spl les/L, where m is the pattern length, k is the number of allowed errors and L is the length, in bits, of the machine word. For a class of simple collage systems, the algorithm runs in O(k/sup 2/(/spl par//spl Dscr//spl par/+|/spl Sscr/|)+km) time using O(k/sup 2//spl par//spl Dscr//spl par/) space, where /spl par//spl Dscr//spl par/ is the size of dictionary /spl Dscr/ and |/spl Sscr/| is the number of tokens in the sequence /spl Sscr/. The LZ78 (Lempel-Ziv, 1978) and the LZW (Lempel-Ziv-Welch, 1984) compression methods are covered by this class. Since we can regard n=/spl par//spl Dscr//spl par/+|/spl Sscr/| as the compressed length, the time and space complexities are O(k/sup 2/n+km) and O(k/sup 2/n), respectively. For general k and m, they become O(k/sup 3/mn/L+km) and O(k/sup 3/mn/L). Thus, our algorithm is competitive to the algorithm proposed by J. Ka/spl uml/rkka/spl uml/inen, et al. (2000), which runs in O(km) time using O(kmn) space, when k=O(/spl radic/L).