Optimal distributed declustering using replication

Authors:
Keith B. Frikken
Affiliations:
CERIAS and Department of Computer Sciences, Purdue University, West Lafayette, IN
Venue:
ICDT'05 Proceedings of the 10th international conference on Database Theory
Year:
2005

Citing 16
Cited 4

Optimal file distribution for partial match retrieval

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Optimal response time retrieval of replicated data (extended abstract)

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
(Almost) optimal parallel block access to range queries

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast concurrent access to parallel disks

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Reconciling simplicity and realism in parallel disk models

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
Random duplicate storage strategies for load balancing in multimedia servers

Information Processing Letters
From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Cyclic Allocation of Two-Dimensional Data

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Chained Declustering: A New Availability Strategy for Multiprocessor Database Machines

Proceedings of the Sixth International Conference on Data Engineering
Optimal Allocation of Two-Dimensional Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Asymptotically Optimal Declustering Schemes for Range Queries

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Optimal Parallel I/O for Range Queries through Replication

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Hierarchical Declustering Schemes for Range Queries

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Declustering Using Golden Ratio Sequences

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Replication and retrieval strategies of multidimensional data on parallel disks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management

Efficient parallel processing of range queries through replicated declustering

Distributed and Parallel Databases
Threshold-based declustering

Information Sciences: an International Journal
Divide-and-conquer scheme for strictly optimal retrieval of range queries

ACM Transactions on Storage (TOS)
Generalized Optimal Response Time Retrieval of Replicated Data from Storage Arrays

ACM Transactions on Storage (TOS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common technique for improving performance for database query retrieval is to decluster the database among multiple disks so that retrievals can be parallelized. In this paper we focus on answering range queries over a multidimensional database, where each of its dimensions are divided uniformly to obtain tiles which are placed on different disks; there has been a significant amount of research for determining how to place the records on disks to minimize the retrieval time. Recently, the idea of using replication (i.e., placing records on more than one disk) to improve performance has been introduced. When using replication there are two goals: i) to minimize the retrieval time and ii) to minimize the scheduling overhead it takes to determine which disk obtains a specific record when processing a query. The previously known replicated declustering schemes with low retrieval times are randomized; and one of the primary advantages of randomized schemes is that they balance the load evenly among the disks for large queries with high probability. In this paper we introduce a new class of replicated placement schemes called the shift schemes that are: i) deterministic, ii) have retrieval performance that is comparable to the randomized schemes, iii) have a strictly optimal retrieval time for all large queries, and iv) have a more efficient query scheduling algorithm than those for the randomized placements. Furthermore, we display experimental results that suggest that the shift schemes have stronger average performance (in terms of retrieval times) than the randomized schemes.