Finding frequent elements in compressed 2D arrays and strings

  • Authors:
  • Travis Gagie;Meng He;J. Ian Munro;Patrick K. Nicholson

  • Affiliations:
  • Department of Computer Science and Engineering, Aalto University, Finland;Cheriton School of Computer Science, University of Waterloo, Canada;Cheriton School of Computer Science, University of Waterloo, Canada;Cheriton School of Computer Science, University of Waterloo, Canada

  • Venue:
  • SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We show how to store a compressed two-dimensional array such that, if we are asked for the elements with high relative frequency in a range, we can quickly return a short list of candidates that includes them. More specifically, given an m × n array A and a fraction α 0, we can store A in O(mn(H + 1)log2(1/α) bits, where H is the entropy of the elements' distribution in A, such that later, given a rectangular range in A and a fraction β ≥ α, in O(1/β) time we can return a list of O(1/β) distinct array elements that includes all the elements that have relative frequency at least β in that range. We do not verify that the elements in the list have relative frequency at least β, so the list may contain false positives. In the case when m = 1, i.e., A is a string, we improve this space bound by a factor of log(1/α), and explore a spacetime trade off for verifying the frequency of the elements in the list. This leads to an O(n min(log(1/α), H +1) log n) bit data structure for strings that, in O(1/β) time, can return the O(1/β) elements that have relative frequency at least β in a given range, without false positives, for β ≥ α.