Optimal Parallel I/O Using Replication

  • Authors:
  • Affiliations:
  • Venue:
  • ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

There has been a lot of interest in declustering spatial data for efficient parallel I/O.Declustering is used to distribute blocks of data among multiple devices, thus enabling parallel I/O access and reducing query response times. A strictly optimal declustering, or disk allocation, technique is the one that achieves optimal performance for all possible queries. It has been proved that it is impossible to reach strict optimality for parallel I/O in general scenarios, and the lower bound on extra disk accesses is proved to be \Alpha(log m) for m disks even in the restricted case of m-by-m grid. Therefore, all current approaches have been trying to achieve this bound. In this paper, we propose to use replication to reach optimal parallel I/O in multi-disk/processor architectures. Replication is a well-known and effective solution for several problems in a database context, especially for availability and load balancing problems. We explore the idea of replication in the context of declustering and propose optimal declustering techniques using intelligent replication. We investigate whether strictly optimal parallel I/O is achievable using a small amount of replication. We especially focus on allocations based on latin squares and derive several nice properties for replication and declustering purposes. Three different replication strategies are proposed and evaluated. Using the proposed schemes, strict optimality is reached, even with a single replication, for all possible range queries and for several number of disks. We first show this for m-by-m grid on m disks, and then generalize it to any arbitrary a-by-b grids. We also show how to efficiently find optimal disk accesses for a given arbitrary query by storing minimal information.