Processing compressed texts: a tractability border

  • Authors:
  • Yury Lifshits

  • Affiliations:
  • Steklov Institute of Mathematics at St. Petersburg, Russia

  • Venue:
  • CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

What kind of operations can we perform effectively (without full unpacking) with compressed texts? In this paper we consider three fundamental problems: (1) check the equality of two compressed texts, (2) check whether one compressed text is a substring of another compressed text, and (3) compute the number of different symbols (Hamming distance) between two compressed texts of the same length. We present an algorithm that solves the first problem in O(n3) time and the second problem in O(n2m) time. Here n is the size of compressed representation (we consider representations by straight-line programs) of the text and m is the size of compressed representation of the pattern. Next, we prove that the third problem is actually #P-complete. Thus, we indicate a pair of similar problems (equivalence checking, Hamming distance computation) that have radically different complexity on compressed texts. Our algorithmic technique used for problems (1) and (2) helps for computing minimal periods and covers of compressed texts.