GeMDA: A Multidimensional Data Partitioning Technique for Multiprocessor Database Systems

Authors:
Yu-Lung Lo;Kien A. Hua;Honesty C. Young
Affiliations:
Department of Information Management, Chaoyang University of Technology, 168, GiFeng E. Rd., WuFeng, TaiChung County, Taiwan 413, Republic of China. yllo@cyut.edu.tw;School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816-2362, USA. kienhua@cs.ucf.edu;IBM Research Division, Almaden Research Center, San Jose, CA 95120-6099, USA. young@almaden.ibm.com
Venue:
Distributed and Parallel Databases
Year:
2001

Citing 26
Cited 5

Data placement in Bubba

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Optimal file distribution for partial match retrieval

SIGMOD '88 Proceedings of the 1988 ACM SIGMOD international conference on Management of data
Fractals for secondary key retrieval

PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
An adaptive data placement scheme for parallel database computer systems

Proceedings of the sixteenth international conference on Very large databases
Principles of distributed database systems

Principles of distributed database systems
Linear clustering of objects with multiple attributes

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A benchmark of NonStop SQL release 2 demonstrating near-linear speedup and scaleup on large databases

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Disk Allocation Methods Using Error Correcting Codes

IEEE Transactions on Computers
Fragmenting Relations Horizontally Using a Knowledge-Based Approach

IEEE Transactions on Software Engineering
A performance analysis of alternative multi-attribute declustering strategies

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
Exploiting database parallelism in a message-passing multiprocessor

IBM Journal of Research and Development
Optimal disk allocation for partial match queries

ACM Transactions on Database Systems (TODS)
A new fragmentation scheme for recursive query processing

Data & Knowledge Engineering
A self-adjusting data distribution mechanism for multidimensional load balancing in multiprocessor-based database systems

Information Systems
Optimizer-assisted load balancing techniques for multicomputer database management systems

Journal of Parallel and Distributed Computing
The Grid File: An Adaptable, Symmetric Multikey File Structure

ACM Transactions on Database Systems (TODS)
Disk allocation for Cartesian product files on multiple-disk systems

ACM Transactions on Database Systems (TODS)
Declustering using fractals

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Prototyping Bubba, A Highly Parallel Database System

IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
MAGIC: A Multiattribute Declustering Mechanism for Multiprocessor Database Machines

IEEE Transactions on Parallel and Distributed Systems
Fragmentation of Recursive Relations in Distributed Datbases

EDBT '92 Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology
Cyclic Allocation of Two-Dimensional Data

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Disk Allocation Methods for Parallelizing Grid Files

Proceedings of the Tenth International Conference on Data Engineering
Declustering Objects for Visualization

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Concentric Hyperspaces and Disk Allocation for Fast Parallel Range Searching

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Multidimensional Declustering Schemes Using Golden Ratio and Kronecker Sequences

IEEE Transactions on Knowledge and Data Engineering
From discrepancy to declustering: Near-optimal multidimensional declustering strategies for range queries

Journal of the ACM (JACM)
Architecture of Parallel Spatial Data Warehouse: Balancing Algorithm and Resumption of Data Extraction

Proceedings of the 2005 conference on Software Engineering: Evolution and Emerging Technologies
Online balancing of ar-tree indexed distributed spatial data warehouse

PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Scalable store of java objects using range partitioning

CEE-SET'09 Proceedings of the 4th IFIP TC 2 Central and East European conference on Advances in Software Engineering Techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several studies have repeatedly demonstrated that both the performance and scalability of a shared-nothing parallel database system depend on the physical layout of data across the processing nodes of the system. Today, data is allocated in these systems using horizontal partitioning strategies. This approach has a number of drawbacks. If a query involves the partitioning attribute, then typically only a small number of the processing nodes can be used to speedup the execution of this query. On the other hand, if the predicate of a selection query includes an attribute other than the partitioning attribute, then the entire data space must be searched. Again, this results in waste of computing resources. In recent years, several multidimensional data declustering techniques have been proposed to address these problems. However, these schemes are too restrictive (e.g., FX, ECC, etc.), or optimized for a certain type of queries (e.g., DM, HCAM, etc.). In this paper, we introduce a new technique which is flexible, and performs well for general queries. We prove its optimality properties, and present experimental results showing that our scheme outperforms DM and HCAM by a significant margin.