Finding frequent elements in compressed 2D arrays and strings

Authors:
Travis Gagie;Meng He;J. Ian Munro;Patrick K. Nicholson
Affiliations:
Department of Computer Science and Engineering, Aalto University, Finland;Cheriton School of Computer Science, University of Waterloo, Canada;Cheriton School of Computer Science, University of Waterloo, Canada;Cheriton School of Computer Science, University of Waterloo, Canada
Venue:
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Year:
2011

Citing 8
Cited 3

Efficient suffix trees on secondary storage

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Frequency Estimation of Internet Packet Streams with Limited Space

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
A simple algorithm for finding frequent elements in streams and bags

ACM Transactions on Database Systems (TODS)
An improved data stream summary: the count-min sketch and its applications

Journal of Algorithms
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Range majority in constant time and linear space

ICALP'11 Proceedings of the 38th international colloquim conference on Automata, languages and programming - Volume Part I

Linear-Space data structures for range minority query in arrays

SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory
Space-efficient data-analysis queries on grids

Theoretical Computer Science
Better space bounds for parameterized range majority and minority

WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show how to store a compressed two-dimensional array such that, if we are asked for the elements with high relative frequency in a range, we can quickly return a short list of candidates that includes them. More specifically, given an m × n array A and a fraction α 0, we can store A in O(mn(H + 1)log2(1/α) bits, where H is the entropy of the elements' distribution in A, such that later, given a rectangular range in A and a fraction β ≥ α, in O(1/β) time we can return a list of O(1/β) distinct array elements that includes all the elements that have relative frequency at least β in that range. We do not verify that the elements in the list have relative frequency at least β, so the list may contain false positives. In the case when m = 1, i.e., A is a string, we improve this space bound by a factor of log(1/α), and explore a spacetime trade off for verifying the frequency of the elements in the list. This leads to an O(n min(log(1/α), H +1) log n) bit data structure for strings that, in O(1/β) time, can return the O(1/β) elements that have relative frequency at least β in a given range, without false positives, for β ≥ α.