Selective Replicated Declustering for Arbitrary Queries

Authors:
K. Yasin Oktay;Ata Turk;Cevdet Aykanat
Affiliations:
Department of Computer Engineering, Bilkent University, Ankara, Turkey 06800;Department of Computer Engineering, Bilkent University, Ankara, Turkey 06800;Department of Computer Engineering, Bilkent University, Ankara, Turkey 06800
Venue:
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Year:
2009

Citing 13
Cited 1

Optimal response time retrieval of replicated data (extended abstract)

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Partitioning similarity graphs: a framework for declustering problems

Information Systems
Fast concurrent access to parallel disks

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
A Hypergraph Based Approach to Declustering Problems

Distributed and Parallel Databases
A linear-time heuristic for improving network partitions

DAC '82 Proceedings of the 19th Design Automation Conference
Replicated declustering for arbitrary queries

Proceedings of the 2004 ACM symposium on Applied computing
Hypergraph Models and Algorithms for Data-Pattern-Based Clustering

Data Mining and Knowledge Discovery
Iterative-improvement-based declustering heuristics for multi-disk databases

Information Systems
Design Theoretic Approach to Replicated Declustering

ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
Threshold-based declustering

Information Sciences: an International Journal
Analysis and Comparison of Replicated Declustering Schemes

IEEE Transactions on Parallel and Distributed Systems
Clustering spatial networks for aggregate query processing: A hypergraph approach

Information Systems
A link-based storage scheme for efficient aggregate query processing on clustered road networks

Information Systems

Generalized Optimal Response Time Retrieval of Replicated Data from Storage Arrays

ACM Transactions on Storage (TOS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data declustering is used to minimize query response times in data intensive applications. In this technique, query retrieval process is parallelized by distributing the data among several disks and it is useful in applications such as geographic information systems that access huge amounts of data. Declustering with replication is an extension of declustering with possible data replicas in the system. Many replicated declustering schemes have been proposed. Most of these schemes generate two or more copies of all data items. However, some applications have very large data sizes and even having two copies of all data items may not be feasible. In such systems selective replication is a necessity. Furthermore, existing replication schemes are not designed to utilize query distribution information if such information is available. In this study we propose a replicated declustering scheme that decides both on the data items to be replicated and the assignment of all data items to disks when there is limited replication capacity. We make use of available query information in order to decide replication and partitioning of the data and try to optimize aggregate parallel response time. We propose and implement a Fiduccia-Mattheyses-like iterative improvement algorithm to obtain a two-way replicated declustering and use this algorithm in a recursive framework to generate a multi-way replicated declustering. Experiments conducted with arbitrary queries on real datasets show that, especially for low replication constraints, the proposed scheme yields better performance results compared to existing replicated declustering schemes.