From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries

Authors:
Chung-Min Chen;Christine T. Cheng
Affiliations:
Telcordia Technologies, Morristown, NJ;Institute for Mathematics & its Applications, Minneapolis, MN
Venue:
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2002

Citing 22
Cited 16

Optimal file distribution for partial match retrieval

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Gray Codes for Partial Match and Range Queries

IEEE Transactions on Software Engineering
Dealing with the data deluge

IEEE Spectrum
Partitioning similarity graphs: a framework for declustering problems

Information Systems
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
(Almost) optimal parallel block access to range queries

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Parallel I/O for high performance computing

Parallel I/O for high performance computing
Declustering using fractals

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Remote Sensing Digital Image Analysis: An Introduction

Remote Sensing Digital Image Analysis: An Introduction
Scalability Analysis of Declustering Methods for Multidimensional Range Queries

IEEE Transactions on Knowledge and Data Engineering
Analysis and Comparison of Declustering Schemes for Interactive Navigation Queries

IEEE Transactions on Knowledge and Data Engineering
Titan: A High-Performance Remote Sensing Database

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Cyclic Allocation of Two-Dimensional Data

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Disk Allocation Methods for Parallelizing Grid Files

Proceedings of the Tenth International Conference on Data Engineering
Optimal Allocation of Two-Dimensional Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
The Idea of De-Clustering and its Applications

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
CMD: A Multidimensional Declustering Method for Parallel Data Systems

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Declustering Objects for Visualization

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Efficient Retrieval of Multidimensional Datasets through Parallel I/O

HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
Declustering Using Golden Ratio Sequences

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
New GDM-Based Declustering Methods for Parallel Range Queries

IDEAS '99 Proceedings of the 1999 International Symposium on Database Engineering & Applications
Efficient Disk Allocation Schemes for Parallel Retrieval of Multidimensional Grid Data

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management

Multidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences

IEEE Transactions on Knowledge and Data Engineering
Replication and retrieval strategies of multidimensional data on parallel disks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Replicated declustering for arbitrary queries

Proceedings of the 2004 ACM symposium on Applied computing
Replicated declustering of spatial data

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal data-space partitioning of spatial data for parallel I/O

Distributed and Parallel Databases
Efficient retrieval of replicated data

Distributed and Parallel Databases
Efficient parallel processing of range queries through replicated declustering

Distributed and Parallel Databases
Improved bounds and schemes for the declustering problem

Theoretical Computer Science
Data space mapping for efficient I/O in large multi-dimensional databases

Information Systems
Threshold-based declustering

Information Sciences: an International Journal
Equivalent disk allocations

Proceedings of the 2007 ACM symposium on Applied computing
Divide-and-conquer scheme for strictly optimal retrieval of range queries

ACM Transactions on Storage (TOS)
A study on grid partition for declustering high-dimensional data

ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
Threshold based declustering in high dimensions

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Optimal distributed declustering using replication

ICDT'05 Proceedings of the 10th international conference on Database Theory
Generalized Optimal Response Time Retrieval of Replicated Data from Storage Arrays

ACM Transactions on Storage (TOS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Declustering schemes allocate data blocks among multiple disks to enable parallel retrieval. Given a declustering scheme D, its response time with respect to a query Q, rt(Q), is defined to be the maximum number of disk blocks of the query stored by the scheme in any one of the disks. If |Q| is the number of data blocks in Q and M is the number of disks then rt(Q) is at least |Q|/M. One way to evaluate the performance of D with respect to a set of queries 𝑄 is to measure its additive error - the maximum difference between rt(Q) from |Q|/M over all range queries Q ε 𝑄.In this paper, we consider the problem of designing declustering schemes for uniform multidimensional data arranged in a d-dimensional grid so that their additive errors with respect to range queries are as small as possible. It has been shown that such declustering schemes will have an additive error of Ω(log M) when d = 2 and Ω(log d-1/2 M) when d 2 with respect to range queries.Asymptotically optimal declustering schemes exist for 2-dimensional data. For data in larger dimensions, however, the best bound for additive errors is O(Md-1), which is extremely large. In this paper, we propose the two declustering schemes based on low discrepancy points in d-dimensions. When d is fixed, both schemes have an additive error of O(logd-1 M) with respect to range queries provided certain conditions are satisfied: the first scheme requires d ≥ 3 and M to be a power of a prime where the prime is at least d while the second scheme requires the size of the data to grow within some polynomial of M, with no restriction on M. These are the first known multidimensional declustering schemes with additive errors near optimal.