Faster Text Fingerprinting

Authors:
Roman Kolpakov;Mathieu Raffinot
Affiliations:
Liapunov French-Russian Institute, Lomonosov Moscow State University, Moscow, Russia;CNRS, LIAFA, Univ. Paris Diderot - Paris 7, Paris Cedex 13, France 75205
Venue:
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Year:
2008

Citing 9
Cited 0

A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Constructing Suffix Trees On-Line in Linear Time

Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
Finding All Common Intervals of k Permutations

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Rapid identification of repeated patterns in strings, trees and arrays

STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
Efficient text fingerprinting via Parikh mapping

Journal of Discrete Algorithms
New algorithms for text fingerprinting

Journal of Discrete Algorithms
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
New algorithms for text fingerprinting

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Computing common intervals of K permutations, with applications to modular decomposition of graphs

ESA'05 Proceedings of the 13th annual European conference on Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Let s = s 1 .. s n be a text (or sequence) on a finite alphabet Σ . A fingerprint in s is the set of distinct characters contained in one of its substrings. Fingerprinting a text consists in computing the set ${\cal F}$ of all fingerprints of all its substrings. A fingerprint, $f \in {\cal F}$, admits a number of maximal locations ***i ,j *** in S , that is the alphabet of s i .. s j is f and s i *** 1 , s j + 1 , if defined, are not in f . The set of maximal locations is ${\cal L}, \; |{\cal L}| \leq n |\Sigma|.$ Two maximal locations ***i ,j *** and ***k ,l *** such that s i ..s j = s k ..s l are named copies and the quotient of ${\cal L}$ according to the copy relation is named ${\cal L}_C$. The faster algorithm to compute all fingerprints in s runs in $O(n+|{\cal L}|\log |\Sigma|)$ time. We present an $O((n+|{\cal L}_C|)\log |\Sigma|)$ worst case time algorithm.