Optimal Parallel I/O Using Replication

Authors:
Affiliations:
Venue:
ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Year:
2002

Citing 0
Cited 11

Replicated declustering for arbitrary queries

Proceedings of the 2004 ACM symposium on Applied computing
Replicated declustering of spatial data

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal data-space partitioning of spatial data for parallel I/O

Distributed and Parallel Databases
Efficient retrieval of replicated data

Distributed and Parallel Databases
Efficient parallel processing of range queries through replicated declustering

Distributed and Parallel Databases
Threshold-based declustering

Information Sciences: an International Journal
Equivalent disk allocations

Proceedings of the 2007 ACM symposium on Applied computing
A global and parallel file system for grids

Future Generation Computer Systems - Special section: Data mining in grid computing environments
Divide-and-conquer scheme for strictly optimal retrieval of range queries

ACM Transactions on Storage (TOS)
Threshold based declustering in high dimensions

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Generalized Optimal Response Time Retrieval of Replicated Data from Storage Arrays

ACM Transactions on Storage (TOS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

There has been a lot of interest in declustering spatial data for efficient parallel I/O.Declustering is used to distribute blocks of data among multiple devices, thus enabling parallel I/O access and reducing query response times. A strictly optimal declustering, or disk allocation, technique is the one that achieves optimal performance for all possible queries. It has been proved that it is impossible to reach strict optimality for parallel I/O in general scenarios, and the lower bound on extra disk accesses is proved to be \Alpha(log m) for m disks even in the restricted case of m-by-m grid. Therefore, all current approaches have been trying to achieve this bound. In this paper, we propose to use replication to reach optimal parallel I/O in multi-disk/processor architectures. Replication is a well-known and effective solution for several problems in a database context, especially for availability and load balancing problems. We explore the idea of replication in the context of declustering and propose optimal declustering techniques using intelligent replication. We investigate whether strictly optimal parallel I/O is achievable using a small amount of replication. We especially focus on allocations based on latin squares and derive several nice properties for replication and declustering purposes. Three different replication strategies are proposed and evaluated. Using the proposed schemes, strict optimality is reached, even with a single replication, for all possible range queries and for several number of disks. We first show this for m-by-m grid on m disks, and then generalize it to any arbitrary a-by-b grids. We also show how to efficiently find optimal disk accesses for a given arbitrary query by storing minimal information.