Optimal distributed declustering using replication

  • Authors:
  • Keith B. Frikken

  • Affiliations:
  • CERIAS and Department of Computer Sciences, Purdue University, West Lafayette, IN

  • Venue:
  • ICDT'05 Proceedings of the 10th international conference on Database Theory
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

A common technique for improving performance for database query retrieval is to decluster the database among multiple disks so that retrievals can be parallelized. In this paper we focus on answering range queries over a multidimensional database, where each of its dimensions are divided uniformly to obtain tiles which are placed on different disks; there has been a significant amount of research for determining how to place the records on disks to minimize the retrieval time. Recently, the idea of using replication (i.e., placing records on more than one disk) to improve performance has been introduced. When using replication there are two goals: i) to minimize the retrieval time and ii) to minimize the scheduling overhead it takes to determine which disk obtains a specific record when processing a query. The previously known replicated declustering schemes with low retrieval times are randomized; and one of the primary advantages of randomized schemes is that they balance the load evenly among the disks for large queries with high probability. In this paper we introduce a new class of replicated placement schemes called the shift schemes that are: i) deterministic, ii) have retrieval performance that is comparable to the randomized schemes, iii) have a strictly optimal retrieval time for all large queries, and iv) have a more efficient query scheduling algorithm than those for the randomized placements. Furthermore, we display experimental results that suggest that the shift schemes have stronger average performance (in terms of retrieval times) than the randomized schemes.