SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Proceedings of the sixteenth international conference on Very large databases
Effectiveness of Parallel Joins
IEEE Transactions on Knowledge and Data Engineering
New Algorithms for Parallelizing Relational Database Joins in the Presence of Data Skew
IEEE Transactions on Knowledge and Data Engineering
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
DEXA '99 Proceedings of the 10th International Conference on Database and Expert Systems Applications
A Skew-insensitive Algorithm for Join and Multi-join Operations on Shared Nothing Machines
DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
A Case for Parallelism in Data Warehousing and OLAP
DEXA '98 Proceedings of the 9th International Workshop on Database and Expert Systems Applications
An efficient equi-semi-join algorithm for distributed architectures
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Semi-join computation on distributed file systems using map-reduce-merge model
Proceedings of the 2010 ACM Symposium on Applied Computing
JaCk-SAT: a new parallel scheme to solve the satisfiability problem (SAT) based on join-and-check
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
An efficient skew-insensitive algorithm for join processing on grid architectures
Proceedings of the fifth international workshop on High-level parallel programming and applications
Hi-index | 0.00 |
The development of scalable parallel database systems requires the design of efficient algorithms for the join operation which is the most frequent and expensive operation in relational database systems. The join is also the most vulnerable operation to data skew and to the high cost of communication in distributed architectures. In this paper, we present a new parallel algorithm for join and multi-join operations on distributed architectures based on an efficient semi-join computation technique. This algorithm is proved to have optimal complexity and deterministic perfect load balancing. Its tradeoff between balancing overhead and speedup is analyzed using the BSP cost model which predicts a negligible join product skew and a linear speed-up. This algorithm improves our fa_join and sfa_join algorithms by reducing their communication and synchronization cost to a minimum while offering the same load balancing properties even for highly skewed data.