A Parallel Hash Join Algorithm for Managing Data Skew
IEEE Transactions on Parallel and Distributed Systems
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
A Fast Selection Algorithm and the Problem of Optimum Distribution of Effort
Journal of the ACM (JACM)
Effectiveness of Parallel Joins
IEEE Transactions on Knowledge and Data Engineering
A Parallel Sort Merge Join Algorithm for Managing Data Skew
IEEE Transactions on Parallel and Distributed Systems
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
On Disk Allocation of Intermediate Query Results in Parallel Database Systems
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Handling data skew in parallel joins in shared-nothing systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Efficient outer join data skew handling in parallel DBMS
Proceedings of the VLDB Endowment
An optimal skew-insensitive join and multi-join algorithm for distributed architectures
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
An efficient equi-semi-join algorithm for distributed architectures
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Adaptive MapReduce using situation-aware mappers
Proceedings of the 15th International Conference on Extending Database Technology
Hi-index | 0.00 |
Parallel processing is an attractive option for relational database systems. As in any parallel environment however, load balancing is a critical issue which affects overall performance. Load balancing for one common database operation in particular, the join of two relations, can be severely hampered for conventional parallel algorithms, due to a natural phenomenon known as data skew. In a pair of recent papers (J. Wolf et al., 1993; 1993), we described two new join algorithms designed to address the data skew problem. We propose significant improvements to both algorithms, increasing their effectiveness while simultaneously decreasing their execution times. The paper then focuses on the comparative performance of the improved algorithms and their more conventional counterparts. The new algorithms outperform their more conventional counterparts in the presence of just about any skew at all, dramatically so in cases of high skew.