Performance Comparison of Pipelined Hash Joins on Workstation Clusters

Authors:
Kenji Imasaki;Hong Nguyen;Sivarama P. Dandamudi
Affiliations:
-;-;-
Venue:
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Year:
2002

Citing 13
Cited 2

A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems

Communications of the ACM
On parallel execution of multiple pipelined hash joins

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins

IEEE Transactions on Knowledge and Data Engineering
Centralized Architecture for Parallel Query Processing on Networks of Workstations

HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Distributed Parallel Query Processing on Networks of Workstations

HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
An Adaptive Hash Join Algorithm on a Network of Workstations

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A PC-NOW Based Parallel Extension for a Sequential DBMS

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Hash Joins and Hash Teams in Microsoft SQL Server

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Using a Network of Workstations to Enhance Database Query Processing Performance

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hierarchical Architecture for Parallel Query Processing on Networks of Workstations

HIPC '98 Proceedings of the Fifth International Conference on High Performance Computing
Performance Evaluation of Nested-Loop Join Processing on Networks of Workstations

ICPADS '00 Proceedings of the Seventh International Conference on Parallel and Distributed Systems

An adaptive load balancing algorithm for large data parallel processing with communication delay

ICCS'03 Proceedings of the 2003 international conference on Computational science
Parallel hash join algorithms for dynamic load balancing in a shared disks cluster

ICCSA'06 Proceedings of the 2006 international conference on Computational Science and Its Applications - Volume Part V

Quantified Score

Hi-index	0.00

Visualization

Abstract

The traditional hash join algorithm uses a single hash table built on one of the relations participating in the join operation. A variation called double hash join was proposed to remedy some of the performance problems with the single join. In this paper, we compare the performance of single- and double-pipelined hash joins in a cluster environment. In this environment, nodes are heterogeneous; furthermore, nodes experience dynamic, non-query local background load that can impact the pipelined query execution performance. Previous studies have shown that double-pipelined hash join performs substantially better than the single-pipelined hash join when dealing with data from remote sources. However, their relative performance has not been studied in cluster environments. Our study indicates that, in the type of cluster environments we consider here, single pipelined hash join performs as well as or better than the double pipelined hash join in most cases. We present experimental results on a Pentium cluster and identify these cases.