A performance analysis of alternative multi-attribute declustering strategies

Authors:
Shahram Ghandeharizadeh;David J. DeWitt;Waheed Qureshi
Affiliations:
Department of Computer Science, University of Southern California;Computer Sciences Department, University of Wisconsin-Madison;Department of Computer Science, University of Southern California
Venue:
SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Year:
1992

Citing 12
Cited 31

Multi-disk management algorithms

SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Data placement in Bubba

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A benchmark of NonStop SQL on the debit credit transaction

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
A comparative analysis of disk scheduling policies

Communications of the ACM
Prototyping Bubba, A Highly Parallel Database System

IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
A Multiuser Performance Analysis of Alternative Declustering Strategies

Proceedings of the Sixth International Conference on Data Engineering
The Design of XPRS

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Benchmarking Database Systems A Systematic Approach

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Physical database design in multiprocessor database systems

Physical database design in multiprocessor database systems
Complex query processing in multiprocessor database machines

Complex query processing in multiprocessor database machines

Database research at Wisconsin

ACM SIGMOD Record
Parallel query processing in shared disk database systems

ACM SIGMOD Record
Efficient disk allocation for fast similarity searching

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Selected Research Issues in Decision Support Databases

Journal of Intelligent Information Systems
GeMDA: A Multidimensional Data Partitioning Technique for Multiprocessor Database Systems

Distributed and Parallel Databases
Storing spatial data on a network of workstations

Cluster Computing
Affinity-based management of main memory database clusters

ACM Transactions on Internet Technology (TOIT)
A Hypergraph Based Approach to Declustering Problems

Distributed and Parallel Databases
A Taxonomy of Indexing Schemes for Parallel Database Systems

Distributed and Parallel Databases
Scalability Analysis of Declustering Methods for Multidimensional Range Queries

IEEE Transactions on Knowledge and Data Engineering
MAGIC: A Multiattribute Declustering Mechanism for Multiprocessor Database Machines

IEEE Transactions on Parallel and Distributed Systems
OLAP Query Routing and Physical Design in a Database Cluster

EDBT '00 Proceedings of the 7th International Conference on Extending Database Technology: Advances in Database Technology
Multi-Dimensional Database Allocation for Parallel Data Warehouses

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Performance of Data-Parallel Spatial Operations

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Non-Uniform Data Fragmentation Strategy for Parallel Main-Menory Database Systems

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Dynamic Query Scheduling in Parallel Data Warehouses

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Data placement in shared-nothing parallel database systems

The VLDB Journal — The International Journal on Very Large Data Bases
Disk Allocation for Fast Range and Nearest-Neighbor Queries

Distributed and Parallel Databases
Replicated declustering for arbitrary queries

Proceedings of the 2004 ACM symposium on Applied computing
One torus to rule them all: multi-dimensional queries in P2P systems

Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004
Replicated declustering of spatial data

PODS '04 Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient retrieval of replicated data

Distributed and Parallel Databases
Efficient parallel processing of range queries through replicated declustering

Distributed and Parallel Databases
Data space mapping for efficient I/O in large multi-dimensional databases

Information Systems
Threshold-based declustering

Information Sciences: an International Journal
Equivalent disk allocations

Proceedings of the 2007 ACM symposium on Applied computing
Online balancing of range-partitioned data with applications to peer-to-peer systems

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Divide-and-conquer scheme for strictly optimal retrieval of range queries

ACM Transactions on Storage (TOS)
Threshold based declustering in high dimensions

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Pipelining mechanism to minimize the latency time in hierarchical multimedia storage managers

Computer Communications

Quantified Score

Hi-index	0.01

Visualization

Abstract

During the past decade, parallel database systems have gained increased popularity due to their high performance, scalability and availability characteristics. With the predicted future database sizes and the complexity of queries, the scalability of these systems to hundreds and thousands of processors is essential for satisfying the projected demand. Several studies have repeatedly demonstrated that both the performance and scalability of a paralel database system is contingent on the physical layout of data across the processors of the system. If the data is not declustered properly, the execution of an operator might waste resources, reducing the overall processing capability of the system.With earlier, single attribute declustering strategies, such as those found in Tandem, Teradata, Gamma, and Bubba parallel database systems, a selection query including a range predicate on any attribute other than the partitioning attribute must be sent to all processors containing tuples of the relation. By directing a query with minimal resource requirements to processors that contain no relevant tuples, the system wastes CPU cycles, communication bandwidth, and I/O bandwidth, reducing its overall processing capability. As a solution, several multi-attribute declustering strategies have been proposed. However, the performance of these declustering techniques have not previously been compared to one another nor with a single attribute partitioning strategy. This paper, compares the performance of Multi-Attribute GrId deClustering (MAGIC) strategy and Bubba's Extended Range Declustering (BERD) strategy with one another and with the range partitioning strategy. Our results indicate that MAGIC outperforms both range and BERD in all experiments conducted in this study.