Faster Text Fingerprinting

  • Authors:
  • Roman Kolpakov;Mathieu Raffinot

  • Affiliations:
  • Liapunov French-Russian Institute, Lomonosov Moscow State University, Moscow, Russia;CNRS, LIAFA, Univ. Paris Diderot - Paris 7, Paris Cedex 13, France 75205

  • Venue:
  • SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Let s = s 1 .. s n be a text (or sequence) on a finite alphabet Σ . A fingerprint in s is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists in computing the set ${\cal F}$ of all fingerprints of all its substrings. A fingerprint, $f \in {\cal F}$, admits a number of maximal locations ***i ,j *** in S , that is the alphabet of s i .. s j is f and s i *** 1 , s j + 1 , if defined, are not in f . The set of maximal locations is ${\cal L}, \; |{\cal L}| \leq n |\Sigma|.$ Two maximal locations ***i ,j *** and ***k ,l *** such that s i ..s j = s k ..s l are named copies and the quotient of ${\cal L}$ according to the copy relation is named ${\cal L}_C$. The faster algorithm to compute all fingerprints in s runs in $O(n+|{\cal L}|\log |\Sigma|)$ time. We present an $O((n+|{\cal L}_C|)\log |\Sigma|)$ worst case time algorithm.