Fragmentation: a technique for efficient query processing
ACM Transactions on Database Systems (TODS)
Multicomputer networks: message-based parallel processing
Multicomputer networks: message-based parallel processing
Communications of the ACM
Comparative benchmarking of relational database systems
Comparative benchmarking of relational database systems
Effect of skew on join performance in parallel architectures
DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
A hash-based join algorithm for a cube-connected parallel computer
Information Processing Letters
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Proceedings of the sixteenth international conference on Very large databases
An adaptive data placement scheme for parallel database computer systems
Proceedings of the sixteenth international conference on Very large databases
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Exploiting database parallelism in a message-passing multiprocessor
IBM Journal of Research and Development
On the development of a site selection optimizer for distributed and parallel database systems
CIKM '93 Proceedings of the second international conference on Information and knowledge management
Dynamic Load Balancing in Very Large Shared-Nothing Hypercube Database Computers
IEEE Transactions on Computers
Optimizer-assisted load balancing techniques for multicomputer database management systems
Journal of Parallel and Distributed Computing
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Considering data skew factor in multi-way join query optimization for parallel execution
The VLDB Journal — The International Journal on Very Large Data Bases - Parallelism in database systems
Prototyping Bubba, A Highly Parallel Database System
IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
Limiting Factors of Join Performance on Parallel Processors
Proceedings of the Fifth International Conference on Data Engineering
An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew
Proceedings of the Seventh International Conference on Data Engineering
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Patching: a multicast technique for true video-on-demand services
MULTIMEDIA '98 Proceedings of the sixth ACM international conference on Multimedia
Dynamic maintenance of multidimensional range data partitioning for parallel data processing
Proceedings of the 1st ACM international workshop on Data warehousing and OLAP
Workfile Disk Management for Concurrent Mergesorts in a Multiprocessor Database System
Distributed and Parallel Databases
Query Optimization in Multidatabase Systems Considering Schema Conflicts
IEEE Transactions on Knowledge and Data Engineering
Optimizing Large Join Queries Using A Graph-Based Approach
IEEE Transactions on Knowledge and Data Engineering
Information Sciences—Applications: An International Journal
Frequency-adaptive join for shared nothing machines
Progress in computer research
An Adaptive Hash Join Algorithm on a Network of Workstations
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance Analysis of Database Systems
Performance Evaluation: Origins and Directions
A graph-theoretic model for optimizing queries involving methods
The VLDB Journal — The International Journal on Very Large Data Bases
An adaptive video multicast scheme for varying workloads
Multimedia Systems
Earthworm: A Network Memory Management Technique for Large-Scale Distributed Multimedia Applications
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
BiHOP: A Bidirectional Highly Optimized Pipelining Technique for Large-Scale Multimedia Servers
INFOCOM '97 Proceedings of the INFOCOM '97. Sixteenth Annual Joint Conference of the IEEE Computer and Communications Societies. Driving the Information Revolution
Information Sciences—Informatics and Computer Science: An International Journal
Content-based object organization for efficient image retrieval in image databases
Decision Support Systems
Hi-index | 0.00 |
Shared nothing multiprocessor architecture is known to be more scalable to support very large databases. Compared to other join strategies, a hash-based join algorithm is particularly efficient and easily parallelized for this computation model. However, this hardware structure is very sensitive to the skew in tuple distribution. Unless the parallel hash join algorithm includes some dynamic load balancing mechanism, the skew effect can severely deteriorate the system performance. In this paper, we investigate this issue. In particular, three parallel hash join algorithms are presented. We implement a simulator to study the effectiveness of these schemes. The simulation model is validated by comparing the simulation results to those produced by the actual implementation of the algorithms running on a multiprocessor system. Our performance study indicates that a naive approach is not able to provide tangible savings. However, the carefully designed strategies can offer substantial improvement over conventional techniques for a wide range of skew conditions.