Investigating Skew and Scalability in Parallel Joins

  • Authors:
  • Christopher B. Walton

  • Affiliations:
  • -

  • Venue:
  • Investigating Skew and Scalability in Parallel Joins
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

This research will improve understanding of the interaction between data skew and scalability in parallel join algorithms. Previous work in this area assumes that data are uniformly distributed, but data skew is widespread in existing databases. This research makes three major contributions: 1. Several distinct types of skew are identified. Previous work treats skew as a homogeneous phenomenon, but simple analytic analysis shows that each type of skew has a different effect on response time. 2. The relative partition model of skew is defined. It is a simple analytic model that allows worst-case analysis of each type of data skew. The use of this model is demonstrated in an analysis of the sort-merge join algorithm. 3. A systematic plan for investigating skew and scalability. The interplay between simple analytic models and detailed simulations is vital: Analytic models bound the results expected from simulation, while more detailed simulation results validate the analytic models.