Efficient parallel processing of range queries through replicated declustering

Authors:
Hakan Ferhatosmanoglu;Ali Şaman Tosun;Guadalupe Canahuate;Aravind Ramachandran
Affiliations:
Department of Computer Science and Engineering, The Ohio State University, Columbus 43210;Department of Computer Science, University of Texas, San Antonio 78249;Department of Computer Science and Engineering, The Ohio State University, Columbus 43210;Microsoft Corporation, Redmond 98052
Venue:
Distributed and Parallel Databases
Year:
2006

Citing 43
Cited 6

An application of number theory to the organization of raster-graphics memory

Journal of the ACM (JACM) - The MIT Press scientific computation series
Data networks

Data networks
Optimal file distribution for partial match retrieval

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Declustering using error correcting codes

PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The design and analysis of spatial data structures

The design and analysis of spatial data structures
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A performance analysis of alternative multi-attribute declustering strategies

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Parallel R-trees

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Optimal disk allocation for partial match queries

ACM Transactions on Database Systems (TODS)
Optimal response time retrieval of replicated data (extended abstract)

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Using rotational mirrored declustering for replica placement in a disk-array-based video server

Proceedings of the third ACM international conference on Multimedia
Partitioning similarity graphs: a framework for declustering problems

Information Systems
Fast parallel similarity search in multimedia databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Efficient disk allocation for fast similarity searching

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Multidimensional access methods

ACM Computing Surveys (CSUR)
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
(Almost) optimal parallel block access to range queries

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Approximation algorithms for data placement on parallel disks

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Declustering using fractals

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
From discrepancy to declustering: near-optimal multidimensional declustering strategies for range queries

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Declustering and Load-Balancing Methods for Parallelizing Geographic Information Systems

IEEE Transactions on Knowledge and Data Engineering
Latin Squares for Parallel Array Access

IEEE Transactions on Parallel and Distributed Systems
Cyclic Allocation of Two-Dimensional Data

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
A Multiuser Performance Analysis of Alternative Declustering Strategies

Proceedings of the Sixth International Conference on Data Engineering
Optimal Allocation of Two-Dimensional Data

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Study of Scalable Declustering Algorithms for Parallel Grid Files

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Latin Cubes and Parallel Array Access

Proceedings of the 8th International Symposium on Parallel Processing
Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Parity Striping of Disk Arrays: Low-Cost Reliable Storage with Acceptable Throughput

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
CMD: A Multidimensional Declustering Method for Parallel Data Systems

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Declustering Objects for Visualization

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Declustering Databases on Heterogeneous Disk Systems

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Dynamic Declustering Methods for Parallel Grid Files

Proceedings of the Third International ACPC Conference with Special Emphasis on Parallel Databases and Parallel I/O: Parallel Computation
A General Multidimensional Data Allocation Method for Multicomputer Database Systems

DEXA '97 Proceedings of the 8th International Conference on Database and Expert Systems Applications
Optimal Parallel I/O for Range Queries through Replication

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Concentric Hyperspaces and Disk Allocation for Fast Parallel Range Searching

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Declustering Using Golden Ratio Sequences

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Optimal Parallel I/O Using Replication

ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Replication and retrieval strategies of multidimensional data on parallel disks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Replicated declustering of spatial data

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Optimal distributed declustering using replication

ICDT'05 Proceedings of the 10th international conference on Database Theory

Threshold-based declustering

Information Sciences: an International Journal
Toward automatic parallelization of spatial computation for computing clusters

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Divide-and-conquer scheme for strictly optimal retrieval of range queries

ACM Transactions on Storage (TOS)
Toward boosting distributed association rule mining by data de-clustering

Information Sciences: an International Journal
Query processing in a DBMS for cluster systems

Programming and Computing Software
Query evaluation techniques for cluster database systems

ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

A common technique used to minimize I/O in data intensive applications is data declustering over parallel servers. This technique involves distributing data among several disks so as to parallelize query retrieval and thus, improve performance. We focus on optimizing access to large spatial data, and the most common type of queries on such data, i.e., range queries. An optimal declustering scheme is one in which the processing for all range queries is balanced uniformly among the available disks. It has been shown that single copy based declustering schemes are non-optimal for range queries. In this paper, we integrate replication in conjunction with parallel disk declustering for efficient processing of range queries. We note that replication is largely used in database applications for several purposes like load balancing, fault tolerance and availability of data. We propose theoretical foundations for replicated declustering and propose a class of replicated declustering schemes, periodic allocations, which are shown to be strictly optimal for a number of disks. We propose a framework for replicated declustering, using a limited amount of replication and provide extensions to apply it on real data, which include arbitrary grids and a large number of disks. Our framework also provides an effective indexing scheme that enables fast identification of data of interest in parallel servers. In addition to optimal processing of single queries, we show that this framework is effective for parallel processing of multiple queries. We present experimental results comparing the proposed replication scheme to other techniques for both single queries and multiple queries, on synthetic and real data sets.