MAGIC: A Multiattribute Declustering Mechanism for Multiprocessor Database Machines

Authors:
S. Ghandeharizadeh;D. J. DeWitt
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1994

Citing 25
Cited 5

Multi-disk management algorithms

SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Artificial intelligence

Artificial intelligence
Process and dataflow control in distributed data-intensive systems

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Data placement in Bubba

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A case for redundant arrays of inexpensive disks (RAID)

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Optimal file distribution for partial match retrieval

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A benchmark of NonStop SQL on the debit credit transaction

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
A performance analysis of alternative multi-attribute declustering strategies

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
A comparative analysis of disk scheduling policies

Communications of the ACM
The K-D-B-tree: a search structure for large multidimensional dynamic indexes

SIGMOD '81 Proceedings of the 1981 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Prototyping Bubba, A Highly Parallel Database System

IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
Proceedings of the Sixth International Workshop on Database Machines

IWDM '89 Proceedings of the Sixth International Workshop on Database Machines
A Multiuser Performance Analysis of Alternative Declustering Strategies

Proceedings of the Sixth International Conference on Data Engineering
Optimal Processor Assignment for Parallel Database Design

Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
The Design of XPRS

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
An Adaptive Data Placement Scheme for Parallel Database Computer Systems

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Hybrid-Range Partitioning Strategy: A New Declustering Strategy for Multiprocessor Database Machines

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Benchmarking Database Systems A Systematic Approach

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Physical database design in multiprocessor database systems

Physical database design in multiprocessor database systems
Complex query processing in multiprocessor database machines

Complex query processing in multiprocessor database machines

DB2 parallel edition

IBM Systems Journal
Dynamic maintenance of multidimensional range data partitioning for parallel data processing

Proceedings of the 1st ACM international workshop on Data warehousing and OLAP
GeMDA: A Multidimensional Data Partitioning Technique for Multiprocessor Database Systems

Distributed and Parallel Databases
Data partitioning and load balancing in parallel disk systems

The VLDB Journal — The International Journal on Very Large Data Bases
Data and knowledge in database systems: parallel databases

Handbook of data mining and knowledge discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

During the past decade, parallel database systems have gained increased popularity dueto their high performance, scalability, and availability characteristics. With the predictedfuture database sizes and complexity of queries, the scalability of these systems tohundreds and thousands of processors is essential for satisfying the projected demand.Several studies have repeatedly demonstrated that both the performance and scalabilityof a parallel database system are contingent on the physical layout of the data acrossthe processors of the system. If the data are not declustered appropriately, theexecution of an operation might waste system resources, reducing the overall processingcapability of the system. With earlier, single-attribute partitioning mechanisms such asthose found in the Tandem, Teradata, Gamma, and Bubba parallel database systems,range selections on any attribute other than the partitioning attribute must be sent to allprocessors containing tuples of the relation, while range selections on the partitioningattribute can be directed to only a subset of the processors. Although using all theprocessors for an operation is reasonable for resource intensive operations, directing aquery with minimal resource requirements to processors that contain no relevant tupleswastes CPU cycles, communication bandwidth, and I/O bandwidth. As a solution, thispaper describes a new partitioning strategy, multiattribute grid declustering (MAGIC),which can use two or more attributes of a relation to decluster its tuples across multipleprocessors and disks. In addition, MAGIC declustering, unlike other multiattributepartitioning mechanisms that have been proposed, is able to support range selections aswell as exact match selections on each of the partitioning attributes. This capabilityenables a greater variety of selection operations to be directed to a restricted subset ofthe processors in the system. Finally, MAGIC partitions each relation based on theresource requirements of the queries that constitute the workload for the relation and theprocessing capacity of the system in order to ensure that the proper number ofprocessors are used to execute queries that reference the relation.