From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries

Authors:
Chung-Min Chen;Christine T. Cheng
Affiliations:
Telcordia Technologies, Piscataway, New Jersey;University of Wisconsin-Milwaukee, Milwaukee, Wisconsin
Venue:
Journal of the ACM (JACM)
Year:
2004

Citing 29
Cited 6

Optimal file distribution for partial match retrieval

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Gray Codes for Partial Match and Range Queries

IEEE Transactions on Software Engineering
Introduction to algorithms

Introduction to algorithms
Dealing with the data deluge

IEEE Spectrum
Optimal response time retrieval of replicated data (extended abstract)

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Partitioning similarity graphs: a framework for declustering problems

Information Systems
Fast parallel similarity search in multimedia databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
(Almost) optimal parallel block access to range queries

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast concurrent access to parallel disks

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Parallel I/O for high performance computing

Parallel I/O for high performance computing
GeMDA: A Multidimensional Data Partitioning Technique for Multiprocessor Database Systems

Distributed and Parallel Databases
Declustering using fractals

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Remote Sensing Digital Image Analysis: An Introduction

Remote Sensing Digital Image Analysis: An Introduction
Scalability Analysis of Declustering Methods for Multidimensional Range Queries

IEEE Transactions on Knowledge and Data Engineering
Analysis and Comparison of Declustering Schemes for Interactive Navigation Queries

IEEE Transactions on Knowledge and Data Engineering
Titan: A High-Performance Remote Sensing Database

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Cyclic Allocation of Two-Dimensional Data

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Disk Allocation Methods for Parallelizing Grid Files

Proceedings of the Tenth International Conference on Data Engineering
Optimal Allocation of Two-Dimensional Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Study of Scalable Declustering Algorithms for Parallel Grid Files

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
The Idea of De-Clustering and its Applications

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
CMD: A Multidimensional Declustering Method for Parallel Data Systems

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Declustering Objects for Visualization

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Asymptotically optimal declustering schemes for 2-dim range queries

Theoretical Computer Science - Database theory
Efficient Retrieval of Multidimensional Datasets through Parallel I/O

HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
Declustering Using Golden Ratio Sequences

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
New GDM-Based Declustering Methods for Parallel Range Queries

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Efficient Disk Allocation Schemes for Parallel Retrieval of Multidimensional Grid Data

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management

Exploiting sequential access when declustering data over disks and MEMS-based storage

Distributed and Parallel Databases
The Optimality of Allocation Methods for Bounded Disagreement Search Queries: The Possible and the Impossible

IEEE Transactions on Knowledge and Data Engineering
Improved bounds and schemes for the declustering problem

Theoretical Computer Science
Efficient similarity-based declustering techniques for keyword-based information retrieval in the streaming data model

PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Toward boosting distributed association rule mining by data de-clustering

Information Sciences: an International Journal
On a non-monotonicity effect of similarity measures

SIMBAD'11 Proceedings of the First international conference on Similarity-based pattern recognition

Quantified Score

Hi-index	0.01

Visualization

Abstract

Declustering schemes allocate data blocks among multiple disks to enable parallel retrieval. Given a declustering scheme D, its response time with respect to a query Q, rt(Q), is defined to be the maximum number of data blocks of the query stored by the scheme in any one of the disks. If |Q| is the number of data blocks in Q and M is the number of disks, then rt(Q) is at least ⌈|Q|/M⌉. One way to evaluate the performance of D with respect to a set of range queries Q is to measure its additive error---the maximum difference of rt(Q) from ⌈|Q|/M⌉ over all range queries Q ∈ Q.In this article, we consider the problem of designing declustering schemes for uniform multidimensional data arranged in a d-dimensional grid so that their additive errors with respect to range queries are as small as possible. It has been shown that for a fixed dimension d ≥ 2, any declustering scheme on an Md grid, a grid with length M on each dimension, will always incur an additive error with respect to range queries of Ω(log M) when d = 2 and Ω(logd−1/2 M) when d 2.Asymptotically optimal declustering schemes exist for 2-dimensional data. However, the best general upper bound known so far for the worst-case additive errors of d-dimensional declustering schemes, d ≥ 3, is O(Md−1), which is large when compared to the lower bound. In this article, we propose two declustering schemes based on low-discrepancy points in d-dimensions. When d is fixed, both schemes have an additive error of O(logd−1 M) with respect to range queries, provided that certain conditions are satisfied: the first scheme requires that the side lengths of the grid grow at a rate polynomial in M, while the second scheme requires d ≥ 2 and M = pt where d ≤ p ≤ C, C a constant, and t is a positive integer such that t(d − 1) ≥ 2. These are the first multidimensional declustering schemes with additive errors proven to be near optimal.