Counting colours in compressed strings

  • Authors:
  • Travis Gagie;Juha Kärkkäinen

  • Affiliations:
  • Aalto University, Finland;University of Helsinki, Finland

  • Venue:
  • CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Motivated by the problem of counting unique visitors to a website, we consider how to preprocess a string s[1..n] such that later, given a substring's endpoints, we can quickly count how many distinct characters that substring contains. The smallest reasonably fast previous data structure for this problem takes n log σ + O(n log log n) bits and answers queries in O(log n) time. We give a data structure for this problem that takes nH0(s) + O(n) + o(nH0(s)) bits, where H0(s) is the 0th-order empirical entropy of s, and answers queries in O(log l) time, where l is the length of the query substring. As far as we know, this is the first data structure, where the query time depends only on l and not on n. We also show how our data structure can be made partially dynamic.