Effectiveness of Parallel Joins
IEEE Transactions on Knowledge and Data Engineering
New Algorithms for Parallelizing Relational Database Joins in the Presence of Data Skew
IEEE Transactions on Knowledge and Data Engineering
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
A Skew-insensitive Algorithm for Join and Multi-join Operations on Shared Nothing Machines
DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
Integrating Semi-Join-Reducers into State of the Art Query Processors
Proceedings of the 17th International Conference on Data Engineering
An optimal skew-insensitive join and multi-join algorithm for distributed architectures
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Toward intersection filter-based optimization for joins in MapReduce
Proceedings of the 2nd International Workshop on Cloud Intelligence
Hi-index | 0.00 |
Semi-joins is the most used technique to optimize the treatment of complex relational queries on distributed architectures. However the overcost related to semi-joins computation can be very high due to data skew and to the high cost of communication in distributed architectures. In this paper we present a parallel equi-semi-join algorithm for shared nothing machines. The performance of this algorithm is analyzed using the BSP cost model and is proved to have asymptotic optimal complexity and perfect load balancing even for highly skewed data. This guarantees unlimited scalability in all situations for this key algorithm.