Ordered and unordered top-K range reporting in large data sets

  • Authors:
  • Peyman Afshani;Gerth Stølting Brodal;Norbert Zeh

  • Affiliations:
  • Dalhousie University, Halifax, NS, Canada;Aarhus University, Denmark;Dalhousie University, Halifax, NS, Canada

  • Venue:
  • Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the following problem: Given an array A storing N real numbers, preprocess it to allow fast reporting of the K smallest elements in the subarray A[i, j] in sorted order, for any triple (i, j, K) with 1 ≤ i ≤ j ≤ N and 1 ≤ K ≤ j − i + 1. We are interested in scenarios where the array A is large, necessitating an I/O-efficient solution. For a parameter f with 1 ≤ f ≤ logm n, we construct a data structure that uses O((N/f) logm n) space and achieves a query bound of O(logB N + fK/B) I/Os, where B is the block size, M is the size of the main memory, n:= N/B, and m:= M/B. Our main contribution is to show that this solution is nearly optimal. To be precise, we show that achieving a query bound of O(logα n + fK/B) I/Os, for any constant α, requires Ω(Nf−1 logM n/log(f−1 logM n)) space, assuming B = Ω(log N). For M ≥ B1+ε, this is within a log logm n factor of the upper bound. The lower bound assumes indivisibility of records and holds even if we assume K is always set to j − 1 + 1. We also show that it is the requirement that the K smallest elements be reported in sorted order which makes the problem hard. If the K smallest elements in the query range can be reported in any order, then we can obtain a linear-size data structure with a query bound of O(logB N + K/B) I/Os.