Replication and retrieval strategies of multidimensional data on parallel disks

Authors:
Chung-Min Chen;Christine T. Cheng
Affiliations:
Telcordia Technologies;University of Wisconsin-Milwaukee, WI
Venue:
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Year:
2003

Citing 11
Cited 8

Optimal response time retrieval of replicated data (extended abstract)

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Balanced allocations: the heavily loaded case

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Fast concurrent access to parallel disks

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Remote Sensing Digital Image Analysis: An Introduction

Remote Sensing Digital Image Analysis: An Introduction
Titan: A High-Performance Remote Sensing Database

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Optimal Parallel I/O for Range Queries through Replication

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Multidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences

IEEE Transactions on Knowledge and Data Engineering
Asymptotically optimal declustering schemes for 2-dim range queries

Theoretical Computer Science - Database theory
Efficient Retrieval of Multidimensional Datasets through Parallel I/O

HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
Efficient Disk Allocation Schemes for Parallel Retrieval of Multidimensional Grid Data

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management

Efficient retrieval of replicated data

Distributed and Parallel Databases
Efficient parallel processing of range queries through replicated declustering

Distributed and Parallel Databases
Threshold-based declustering

Information Sciences: an International Journal
Equivalent disk allocations

Proceedings of the 2007 ACM symposium on Applied computing
Divide-and-conquer scheme for strictly optimal retrieval of range queries

ACM Transactions on Storage (TOS)
Threshold based declustering in high dimensions

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Optimal distributed declustering using replication

ICDT'05 Proceedings of the 10th international conference on Database Theory
Generalized Optimal Response Time Retrieval of Replicated Data from Storage Arrays

ACM Transactions on Storage (TOS)

Quantified Score

Hi-index	0.01

Visualization

Abstract

Aside from enhancing data availability during disk failures, replication of data is also used to speed up I/O performance of read-intensive applications. There are two issues that need to be addressed: (a) data placement (Which disks should store the copies of each data block?) and (b) scheduling (Given a query Q, and a placement scheme P of the data, from which disk should each block in Q be retrieved so that retrieval time is minimized?) In this paper, we consider range queries and assume that the dataset is a multidimensional grid and r copies of each unit block of the grid must be stored among M disks. To accurately measure performance of a scheduling algorithm, we consider a metric that takes into account the scheduling overhead as well as the time it takes to retrieve the data blocks from the disks. We describe several combinations of data placement schemes and scheduling algorithms and analyze their performance for range queries with respect to the above metric. We then present simulation results for the most interesting case r=2, showing that the strategies do perform better than the previously known method, especially for large queries.