From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries

  • Authors:
  • Chung-Min Chen;Christine T. Cheng

  • Affiliations:
  • Telcordia Technologies, Piscataway, New Jersey;University of Wisconsin-Milwaukee, Milwaukee, Wisconsin

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 2004

Quantified Score

Hi-index 0.01

Visualization

Abstract

Declustering schemes allocate data blocks among multiple disks to enable parallel retrieval. Given a declustering scheme D, its response time with respect to a query Q, rt(Q), is defined to be the maximum number of data blocks of the query stored by the scheme in any one of the disks. If |Q| is the number of data blocks in Q and M is the number of disks, then rt(Q) is at least ⌈|Q|/M⌉. One way to evaluate the performance of D with respect to a set of range queries Q is to measure its additive error---the maximum difference of rt(Q) from ⌈|Q|/M⌉ over all range queries Q ∈ Q.In this article, we consider the problem of designing declustering schemes for uniform multidimensional data arranged in a d-dimensional grid so that their additive errors with respect to range queries are as small as possible. It has been shown that for a fixed dimension d ≥ 2, any declustering scheme on an Md grid, a grid with length M on each dimension, will always incur an additive error with respect to range queries of Ω(log M) when d = 2 and Ω(logd−1/2 M) when d 2.Asymptotically optimal declustering schemes exist for 2-dimensional data. However, the best general upper bound known so far for the worst-case additive errors of d-dimensional declustering schemes, d ≥ 3, is O(Md−1), which is large when compared to the lower bound. In this article, we propose two declustering schemes based on low-discrepancy points in d-dimensions. When d is fixed, both schemes have an additive error of O(logd−1 M) with respect to range queries, provided that certain conditions are satisfied: the first scheme requires that the side lengths of the grid grow at a rate polynomial in M, while the second scheme requires d ≥ 2 and M = pt where d ≤ p ≤ C, C a constant, and t is a positive integer such that t(d − 1) ≥ 2. These are the first multidimensional declustering schemes with additive errors proven to be near optimal.