Design and evaluation of parallel pipelined join algorithms

Authors:
James P. Richardson;Hongjun Lu;Krishna Mikkilineni
Affiliations:
Honeywell,Inc., Golden Valley, MN;Honeywell,Inc., Golden Valley, MN;Honeywell,Inc., Golden Valley, MN
Venue:
SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Year:
1987

Citing 6
Cited 19

The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Join and Semijoin Algorithms for a Multiprocessor Database Machine

ACM Transactions on Database Systems (TODS)
Parallel algorithms for the execution of relational database operations

ACM Transactions on Database Systems (TODS)
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Hashing Methods and Relational Algebra Operations

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases

Join processing in relational databases

ACM Computing Surveys (CSUR)
Processing multi-join query in parallel systems

SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
A Symmetric Fragment and Replicate Algorithm for Distributed Joinsyout

IEEE Transactions on Parallel and Distributed Systems
On the optimality of strategies for multiple join

Journal of the ACM (JACM)
On the optimality of strategies for multiple joins

PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Intensive Data Management in Parallel Systems: A Survey

Distributed and Parallel Databases
Optimal Secondary Storage Access Sequence for Performing Relational Join

IEEE Transactions on Knowledge and Data Engineering
Hash-Based and Index-Based Join Algorithms for Cube and Ring Connected Multicomputers

IEEE Transactions on Knowledge and Data Engineering
Parallel Hash-Based Join Algorithms for a Shared-Everything Environment

IEEE Transactions on Knowledge and Data Engineering
Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins

IEEE Transactions on Knowledge and Data Engineering
Optimization of Parallel Execution for Multi-Join Queries

IEEE Transactions on Knowledge and Data Engineering
The Adaptive-Hash Join Algorithm for a Hypercube Multicomputer

IEEE Transactions on Parallel and Distributed Systems
Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution

IEEE Transactions on Software Engineering
The Design of XPRS

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Hash-Based Join Algorithms for Multiprocessor Computers

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Join algorithm costs revisited

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.01

Visualization

Abstract

The join operation is the most costly operation in relational database management systems. Distributed and parallel processing can effectively speed up the join operation. In this paper, we describe a number of highly parallel and pipelined multiprocessor join algorithms using sort-merge and hashing techniques. Among them, two algorithms are parallel and pipelined versions of traditional sort-merge join methods, two algorithms use both hashing and sort-merge techniques, and another two are variations of the hybrid hash join algorithms. The performance of those algorithms is evaluated analytically against a generic database machine architecture. The methodology used in the design and evaluation of these algorithms is also discussed.The results of the analysis indicate that using a hashing technique to partition the source relations can dramatically reduce the elapsed time hash-based algorithms outperform sort-merge algorithms in almost all cases because of their high parallelism. Hash-based sort-merge and hybrid hash methods provide similar performance in most cases. With large source relations, the algorithms which replicate the smaller relation usually give better elapsed time. Sharing memory among processors also improves performance somewhat.