On parallel execution of multiple pipelined hash joins

Authors:
Hui-I Hsiao;Ming-Syan Chen;Philip S. Yu
Affiliations:
IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY;IBM Thomas J. Watson Research Center, P.O. Box 704, Yorktown Heights, NY
Venue:
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Year:
1994

Citing 26
Cited 14

A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Optimization of large join queries: combining heuristics and combinatorial techniques

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
On the effect of join operations on relation sizes

ACM Transactions on Database Systems (TODS)
Hash-based join algorithms for multiprocessor computers with shared memory

Proceedings of the sixteenth international conference on Very large databases
Tradeoffs in processing complex join queries via hashing in multiprocessor database machines

Proceedings of the sixteenth international conference on Very large databases
Left-deep vs. bushy trees: an analysis of strategy spaces and its implications for query optimization

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Parallel database systems: the future of high performance database systems

Communications of the ACM
Query optimization for parallel execution

SIGMOD '92 Proceedings of the 1992 ACM SIGMOD international conference on Management of data
On optimal processor allocation to support pipelined hash joins

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Scheduling multiple queries on a parallel machine

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Including the load balancing issue in the optimization of multi-way join queries for shared-nothing database computers

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Optimizing multi-join queries in parallel relational databases

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
Parallel query processing in DBS3

PDIS '93 Proceedings of the second international conference on Parallel and distributed information systems
A performance study of three high availability data replication strategies

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Dataflow query execution in a parallel main-memory environment

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Optimization of parallel query execution plans in XPRS

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Prototyping Bubba, A Highly Parallel Database System

IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
A Pipeline N-Way Join Algorithm Based on the 2-Way Semijoin Program

IEEE Transactions on Knowledge and Data Engineering
Scheduling and Processor Allocation for Parallel Execution of Multi-Join Queries

Proceedings of the Eighth International Conference on Data Engineering
The Design of XPRS

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Disk Shadowing

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Optimization of Multi-Way Join Queries for Parallel Execution

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Applying Hash Filters to Improving the Execution of Bushy Trees

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases

Scheduling multiple queries on a parallel machine

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A new join algorithm

ACM SIGMOD Record
A Hierarchical Approach to Parallel Multiquery Scheduling

IEEE Transactions on Parallel and Distributed Systems
Parallel evaluation of multi-join queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Multi-dimensional resource scheduling for parallel queries

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Load Balancing for Parallel Query Execution on NUMA Multiprocessors

Distributed and Parallel Databases
Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins

IEEE Transactions on Knowledge and Data Engineering
Optimization of Parallel Execution for Multi-Join Queries

IEEE Transactions on Knowledge and Data Engineering
Criss-Cross Hash Joins: Design and Analysis

IEEE Transactions on Knowledge and Data Engineering
Performance Comparison of Pipelined Hash Joins on Workstation Clusters

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Parallel Query Scheduling and Optimization with Time- and Space-Shared Resources

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Dynamic Load Balancing in Hierarchical Parallel Database Systems

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Analytical response time estimation in parallel relational database systems

Parallel Computing
A modeling tool for workload analysis and performance tuning of parallel database applications

ADBIS'97 Proceedings of the First East-European conference on Advances in Databases and Information systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper we study parallel execution of multiple pipelined hash joins. Specifically, we deal with two issues, processor allocation and the use of hash filters, to improve parallel execution of hash joins. We first present a scheme to transform a bushy execution tree to an allocation tree, where each node denotes a pipeline. Then, processors are allocated to the nodes in the allocation tree based on the concept of synchronous execution time such that inner relations (i.e., hash tables) in a pipeline can be made available approximately the same time. In addition, the approach of hash filtering is investigated to further improve the overall performance. Performance studies are conducted via simulation to demonstrate the importance of processor allocation and to evaluate various schemes using hash filters. Simulation results indicate that processor allocation based on the allocation tree significantly outperforms that based on the original bushy tree, and that the effect of hash filtering becomes prominent as the number of relations in a query increases.