Efficient suffix trees on secondary storage
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Computing Iceberg Queries Efficiently
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Frequency Estimation of Internet Packet Streams with Limited Space
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags
ACM Transactions on Database Systems (TODS)
An improved data stream summary: the count-min sketch and its applications
Journal of Algorithms
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Range majority in constant time and linear space
ICALP'11 Proceedings of the 38th international colloquim conference on Automata, languages and programming - Volume Part I
Linear-Space data structures for range minority query in arrays
SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory
Space-efficient data-analysis queries on grids
Theoretical Computer Science
Better space bounds for parameterized range majority and minority
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Hi-index | 0.00 |
We show how to store a compressed two-dimensional array such that, if we are asked for the elements with high relative frequency in a range, we can quickly return a short list of candidates that includes them. More specifically, given an m × n array A and a fraction α 0, we can store A in O(mn(H + 1)log2(1/α) bits, where H is the entropy of the elements' distribution in A, such that later, given a rectangular range in A and a fraction β ≥ α, in O(1/β) time we can return a list of O(1/β) distinct array elements that includes all the elements that have relative frequency at least β in that range. We do not verify that the elements in the list have relative frequency at least β, so the list may contain false positives. In the case when m = 1, i.e., A is a string, we improve this space bound by a factor of log(1/α), and explore a spacetime trade off for verifying the frequency of the elements in the list. This leads to an O(n min(log(1/α), H +1) log n) bit data structure for strings that, in O(1/β) time, can return the O(1/β) elements that have relative frequency at least β in a given range, without false positives, for β ≥ α.