Skew handling techniques in sort-merge join

Authors:
Wei Li;Dengfeng Gao;Richard Thomas Snodgrass
Affiliations:
Oracle Corporation;University of Arizona;University of Arizona
Venue:
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Year:
2002

Citing 18
Cited 4

A framework for query optimization in temporal databases

SSDBM V Proceedings of the fifth international conference on Statistical and scientific database management
The effect of bucket size tuning in the dynamic hybrid GRACE hash join method

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Join processing in relational databases

ACM Computing Surveys (CSUR)
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
On supporting containment queries in relational database management systems

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Database System Implementation

Database System Implementation
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Sort vs. Hash Revisited

IEEE Transactions on Knowledge and Data Engineering
Efficient Evaluation of the Valid-Time Natural Join

Proceedings of the Tenth International Conference on Data Engineering
Sort-Merge-Join: An Idea Whose Time Has(h) Passed?

Proceedings of the Tenth International Conference on Data Engineering
Hash-Partitioned Join Method Using Dynamic Destaging Strategy

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Indexing and Querying XML Data for Regular Path Expressions

Proceedings of the 27th International Conference on Very Large Data Bases
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
An Evaluation of Non-Equijoin Algorithms

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Structural Joins: A Primitive for Efficient XML Query Pattern Matching

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
The optimization of queries in relational databases

The optimization of queries in relational databases

On producing join results early

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Multiresolution amalgamation: dynamic spatial data cube generation

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Join operations in temporal databases

The VLDB Journal — The International Journal on Very Large Data Bases
Skew-resistant parallel processing of feature-extracting scientific user-defined functions

Proceedings of the 1st ACM symposium on Cloud computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Joins are among the most frequently executed operations. Several fast join algorithms have been developed and extensively studied; these can be categorized as sort-merge, hash-based, and index-based algorithms. While all three types of algorithms exhibit excellent performance over most data, ameliorating the performance degradation in the presence of skew has been investigated only for hash-based algorithms. However, for sort-merge join, even a small amount of skew present in realistic data can result in a significant performance hit on a commercial DBMS. This paper examines the negative ramifications of skew in sort-merge join and proposes several refinements that deal effectively with data skew. Experiments show that some of these algorithms also impose virtually no penalty in the absence of data skew and are thus suitable for replacing existing sort-merge implementations. We also show how sort-merge band join performance is significantly enhanced with these refinements.