Various improvements to text fingerprinting

Authors:
Djamal Belazzougui;Roman Kolpakov;Mathieu Raffinot
Affiliations:
Department of Computer Science, University of Helsinki, FI-00014, Finland;Liapunov French-Russian Institute, Lomonosov Moscow State University, Moscow, Russia;LIAFA, Univ. Paris Diderot - Paris 7, 75205 Paris Cedex 13, France
Venue:
Journal of Discrete Algorithms
Year:
2013

Citing 16
Cited 0

Constructing Suffix Trees On-Line in Linear Time

Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
A New Universal Class of Hash Functions and Dynamic Hashing in Real Time

ICALP '90 Proceedings of the 17th International Colloquium on Automata, Languages and Programming
Optimal suffix tree construction with large alphabets

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Rapid identification of repeated patterns in strings, trees and arrays

STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing
Efficient text fingerprinting via Parikh mapping

Journal of Discrete Algorithms
Efficient randomized pattern-matching algorithms

IBM Journal of Research and Development - Mathematics and computing
Character sets of strings

Journal of Discrete Algorithms
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets

ACM Transactions on Algorithms (TALG)
New algorithms for text fingerprinting

Journal of Discrete Algorithms
Dynamic perfect hashing: upper and lower bounds

SFCS '88 Proceedings of the 29th Annual Symposium on Foundations of Computer Science
Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract)

ICALP '08 Proceedings of the 35th international colloquium on Automata, Languages and Programming, Part I
Bloomier Filters: A Second Look

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
An Optimal Bloom Filter Replacement Based on Matrix Solving

CSR '09 Proceedings of the Fourth International Computer Science Symposium in Russia on Computer Science - Theory and Applications
A faster query algorithm for the text fingerprinting problem

ESA'07 Proceedings of the 15th annual European conference on Algorithms
Faster query algorithms for the text fingerprinting problem

Information and Computation
New algorithms for text fingerprinting

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching

Quantified Score

Hi-index	0.00

Visualization

Abstract

Let s=s"1..s"n be a text (or sequence) on a finite alphabet @S of size @s. A fingerprint in s is the set of distinct characters appearing in one of its substrings. The problem considered here is to compute the set F of all fingerprints of all substrings of s in order to answer efficiently certain questions on this set. A substring s"i..s"j is a maximal location for a fingerprint f@?F (denoted by ) if the alphabet of s"i..s"j is f and s"i"-"1, s"j"+"1, if defined, are not in f. The set of maximal locations in s is L (it is easy to see that |L|= and such that s"i..s"j=s"k..s"l are named copies, and the quotient set of L according to the copy relation is denoted by L"C. We first present new exact efficient algorithms and data structures for the following three problems: (1) to compute F; (2) given f as a set of distinct characters in @S, to answer if f represents a fingerprint in F; (3) given f, to find all maximal locations of f in s. As well as in papers concerning succinct data structures, in the paper all space complexities are counted in bits. Problem 1 is solved either in O(n+|L"C|log@s) worst-case time (in this paper all logarithms are intended as base two logarithms) using O((n+|L"C|+|F|log@s)logn) bits of space, or in O(n+|L|log@s) randomized expected time using O((n+|F|log@s)logn) bits of space. Problem 2 is solved either in O(|f|) expected time if only O(|f|logn) bits of working space for queries is allowed, or in worst-case O(|f|/@e) time if a working space of O(@s^@elogn) bits is allowed (with @e a constant satisfying 0