PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric
Journal of the ACM (JACM)
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
New Upper Bounds for Generalized Intersection Searching Problems
ICALP '95 Proceedings of the 22nd International Colloquium on Automata, Languages and Programming
Succinct data structures for flexible text retrieval systems
Journal of Discrete Algorithms
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Rank and select revisited and extended
Theoretical Computer Science
Approximate colored range and point enclosure queries
Journal of Discrete Algorithms
Efficient Colored Orthogonal Range Counting
SIAM Journal on Computing
Encyclopedia of Algorithms
Rank/select on dynamic compressed sequences and applications
Theoretical Computer Science
Implicit compression boosting with applications to self-indexing
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Colored range queries and document retrieval
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Succinct suffix arrays based on run-length encoding
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Dynamic range majority data structures
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Colored range queries and document retrieval
Theoretical Computer Science
Journal of Discrete Algorithms
Hi-index | 0.00 |
Motivated by the problem of counting unique visitors to a website, we consider how to preprocess a string s[1..n] such that later, given a substring's endpoints, we can quickly count how many distinct characters that substring contains. The smallest reasonably fast previous data structure for this problem takes n log σ + O(n log log n) bits and answers queries in O(log n) time. We give a data structure for this problem that takes nH0(s) + O(n) + o(nH0(s)) bits, where H0(s) is the 0th-order empirical entropy of s, and answers queries in O(log l) time, where l is the length of the query substring. As far as we know, this is the first data structure, where the query time depends only on l and not on n. We also show how our data structure can be made partially dynamic.