Counting colours in compressed strings

Authors:
Travis Gagie;Juha Kärkkäinen
Affiliations:
Aalto University, Finland;University of Helsinki, Finland
Venue:
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Year:
2011

Citing 14
Cited 4

PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
New Upper Bounds for Generalized Intersection Searching Problems

ICALP '95 Proceedings of the 22nd International Colloquium on Automata, Languages and Programming
Succinct data structures for flexible text retrieval systems

Journal of Discrete Algorithms
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Rank and select revisited and extended

Theoretical Computer Science
Approximate colored range and point enclosure queries

Journal of Discrete Algorithms
Efficient Colored Orthogonal Range Counting

SIAM Journal on Computing
Encyclopedia of Algorithms

Encyclopedia of Algorithms
Rank/select on dynamic compressed sequences and applications

Theoretical Computer Science
Implicit compression boosting with applications to self-indexing

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Colored range queries and document retrieval

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Succinct suffix arrays based on run-length encoding

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching

Dynamic range majority data structures

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Wavelet trees for all

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Colored range queries and document retrieval

Theoretical Computer Science
Wavelet trees for all

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Motivated by the problem of counting unique visitors to a website, we consider how to preprocess a string s[1..n] such that later, given a substring's endpoints, we can quickly count how many distinct characters that substring contains. The smallest reasonably fast previous data structure for this problem takes n log σ + O(n log log n) bits and answers queries in O(log n) time. We give a data structure for this problem that takes nH0(s) + O(n) + o(nH0(s)) bits, where H0(s) is the 0th-order empirical entropy of s, and answers queries in O(log l) time, where l is the length of the query substring. As far as we know, this is the first data structure, where the query time depends only on l and not on n. We also show how our data structure can be made partially dynamic.