Efficient outer join data skew handling in parallel DBMS

  • Authors:
  • Yu Xu;Pekka Kostamaa

  • Affiliations:
  • Teradata, San Diego, CA;Teradata, El Segundo, CA

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large enterprises have been relying on parallel database management systems (PDBMS) to process their ever-increasing data volume and complex queries. The scalability and performance of a PDBMS comes from load balancing on all nodes in the system. Skewed processing will significantly slow down query response time and degrade the overall system performance. Business intelligence tools used by enterprises frequently generate a large number of outer joins and require high performance from the underlying database systems. Although extensive research has been done on handling skewed processing for inner joins in PDBMS, there is no known research on data skew handling for parallel outer joins. We propose a simple and efficient outer join algorithm called OJSO (Outer Join Skew Optimization) to improve the performance and scalability of parallel outer joins. Our experimental results show that the OJSO algorithm significantly speeds up query elapsed time in the presence of data skew.