Effect of skew on join performance in parallel architectures

  • Authors:
  • M. Seetha Lakshmi;P. S. Yu

  • Affiliations:
  • IBM Research Division, T.J. Watson Research Center, P.O. Box 704, Yorktown Heights, N.Y.;IBM Research Division, T.J. Watson Research Center, P.O. Box 704, Yorktown Heights, N.Y.

  • Venue:
  • DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Skew in the distribution of values taken by an attribute is identified as a major factor that can affect the performance of parallel architectures for relational joins. The effect of skew on the performance of two parallel architectures is evaluated using analytic models. In one architecture, called database machine (DBMC), data as well as processing power are distributed; while in the other architecture, called Single Processor Parallel Input/output (SPPI), data is distributed but the processing power is concentrated in one processor. The two architectures are compared in terms of the ratio of MIPS used by DBMC and SPPI to deliver the same throughput and response time. In addition, the horizontal growth potential of DBMC is evaluated in terms of maximum speedup achievable by DBMC relative to SPPI response time. The MIPS ratio as well as speedup are found to be very sensitive to the amount of skew. These suggest, careful thought should be given in parallelizing database applications and in the design of algorithms and query optimizer for parallel architectures.