Efficient text fingerprinting via Parikh mapping

  • Authors:
  • Amihood Amir;Alberto Apostolico;Gad M. Landau;Giorgio Satta

  • Affiliations:
  • Department of Mathematics and Computer Science, Bar-Ilan University, 52900 Ramat-Gan, Israel and College of Computing, Georgia Institute of Technology, Atlanta, GA;Dipartimento di Elettronica e Informatica, Università di Padova, Via Gradenigo 6/A, 35131 Padova, Italy and Department of Computer Sciences, Purdue University, Computer Sciences Building, Wes ...;Department of Computer Science, Haifa University, Haifa 31905, Israel and Department of Computer and Information Science, Polytechnic University, Six MetroTech Center, Brooklyn, NY;Dipartimento di Elettronica e Informatica, Università di Padova, Via Gradenigo 6/A, 35131 Padova, Italy

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of fingerprinting text by sets of symbols. Specifically, if S is a string, of length n, over a finite, ordered alphabet Σ, and S' is a substring of S, then the fingerprint of S' is the subset φ of Σ of precisely the symbols appearing in S'. In this paper we show efficient methods of answering various queries on fingerprint statistics. Our preprocessing is done in time O(n|Σ|log n log |Σ|) and enables answering the following queries: (1) Given an integer k, compute the number of distinct fingerprints of size k in time O(1). (2) Given a set φ ⊆ Σ, compute the total number of distinct occurrences in S of substrings with fingerprint φ in time O(|Σ|logn).