Replicated declustering of spatial data

  • Authors:
  • Hakan Ferhatosmanoǧlu;Ali Şaman Tosun;Aravind Ramachandran

  • Affiliations:
  • Ohio State University, Columbus, OH;University of Texas, San Antonio, TX;Ohio State University, Columbus, OH

  • Venue:
  • PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of disk declustering is to distribute data among multiple disks to reduce query response times through parallel I/O. A strictly optimal declustering technique is one that achieves optimal parallel I/O for all possible queries. In this paper, we focus on techniques that are optimized for spatial range queries. Current declustering techniques, which have single copies of the data, have been shown to be suboptimal for range queries. The lower bound on extra disk accesses is proved to be Ω(log N) for N disks even in the restricted case of an N-by-N grid, and all current approaches have been trying to achieve this bound. Replication is a well-known and effective solution for several problems in databases, especially for availability and load balancing. In this paper, we explore the idea of replication in the context of declustering and propose a framework where strictly optimal parallel I/O is achievable using a small amount of replication. We provide some theoretical foundations for replicated declustering, e.g., a bound for number of copies for strict optimality on any number of disks, and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal. The results for optimal disk allocation are extended for larger number of disks by increasing replication. Our techniques and results are valid for any arbitrary a-by-b grids, and any declustering scheme can be further improved using our replication framework. Using the framework, we perform experiments to identify a strictly optimal disk access schedule for any given arbitrary range query. In addition to the theoretical bounds, we compare the proposed replication based scheme to other existing techniques by performing experiments on real datasets.