Performance Comparison of Pipelined Hash Joins on Workstation Clusters

  • Authors:
  • Kenji Imasaki;Hong Nguyen;Sivarama P. Dandamudi

  • Affiliations:
  • -;-;-

  • Venue:
  • HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The traditional hash join algorithm uses a single hash table built on one of the relations participating in the join operation. A variation called double hash join was proposed to remedy some of the performance problems with the single join. In this paper, we compare the performance of single- and double-pipelined hash joins in a cluster environment. In this environment, nodes are heterogeneous; furthermore, nodes experience dynamic, non-query local background load that can impact the pipelined query execution performance. Previous studies have shown that double-pipelined hash join performs substantially better than the single-pipelined hash join when dealing with data from remote sources. However, their relative performance has not been studied in cluster environments. Our study indicates that, in the type of cluster environments we consider here, single pipelined hash join performs as well as or better than the double pipelined hash join in most cases. We present experimental results on a Pentium cluster and identify these cases.