SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
A bridging model for parallel computation
Communications of the ACM
Limits of parallelism in hash join algorithms
Performance '93 Proceedings of the 16th IFIP Working Group 7.3 international symposium on Computer performance modeling measurement and evaluation
Parallel evaluation of multi-join queries
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Query Processing in Parallel Relational Database Systems
Query Processing in Parallel Relational Database Systems
Effectiveness of Parallel Joins
IEEE Transactions on Knowledge and Data Engineering
Dynamic and Load-balanced Task-Oriented Datbase Query Processing in Parallel Systems
EDBT '92 Proceedings of the 3rd International Conference on Extending Database Technology: Advances in Database Technology
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Dynamic Join Product Skew Handling for Hash-Joins in Shared-Nothing Database Systems
Proceedings of the 4th International Conference on Database Systems for Advanced Applications (DASFAA)
An efficient skew-insensitive algorithm for join processing on grid architectures
Proceedings of the fifth international workshop on High-level parallel programming and applications
An optimal skew-insensitive join and multi-join algorithm for distributed architectures
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
An efficient equi-semi-join algorithm for distributed architectures
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Hi-index | 0.00 |
Join is an expensive and frequently used operation whose parallelization is highly desirable. However effectiveness of parallel joins depends on the ability to evenly divide load among processors. Data skew can have a disastrous effect on performance. Although many skew-handling algorithms have been proposed they remain generally inefficient in the case of multi-joins due to join product skew, costly and unnecessary redistribution and communication costs. A parallel join algorithm called fa_join has been introduced in an earlier paper with deterministic and near-perfect balancing properties. Despite its advantages, fa_join is sensitive to the correlation of the attribute value distributions in both relations. We present here an improved version of the algorithm called Sfa_join with a symmetric treatment of both relations. Its predictably low join-product and attribute-value skew makes it suitable for repeated use in multi-join operations. Its performance is analyzed theoretically and experimentally, to confirm its linear speed-up and its superiority over fa_join.