Bit-Parallel Approach to Approximate String Matching in Compressed Texts

  • Authors:
  • T. Matsumoto;T. Kida;M. Takeda;A. Shinohara;S. Arikawa

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • SPIRE '00 Proceedings of the Seventh International Symposium on String Processing Information Retrieval (SPIRE'00)
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Addresses the problem of approximate string matching on compressed text. We consider this problem for a text string described in terms of a collage system, which is a formal system proposed by T. Kida et al. (1999) that captures various dictionary-based compression methods. We present an algorithm that exploits bit-parallelism, assuming that our problem fits in a single machine word, e.g. (m-k)(k+1)/spl les/L, where m is the pattern length, k is the number of allowed errors and L is the length, in bits, of the machine word. For a class of simple collage systems, the algorithm runs in O(k/sup 2/(/spl par//spl Dscr//spl par/+|/spl Sscr/|)+km) time using O(k/sup 2//spl par//spl Dscr//spl par/) space, where /spl par//spl Dscr//spl par/ is the size of dictionary /spl Dscr/ and |/spl Sscr/| is the number of tokens in the sequence /spl Sscr/. The LZ78 (Lempel-Ziv, 1978) and the LZW (Lempel-Ziv-Welch, 1984) compression methods are covered by this class. Since we can regard n=/spl par//spl Dscr//spl par/+|/spl Sscr/| as the compressed length, the time and space complexities are O(k/sup 2/n+km) and O(k/sup 2/n), respectively. For general k and m, they become O(k/sup 3/mn/L+km) and O(k/sup 3/mn/L). Thus, our algorithm is competitive to the algorithm proposed by J. Ka/spl uml/rkka/spl uml/inen, et al. (2000), which runs in O(km) time using O(kmn) space, when k=O(/spl radic/L).