The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Join and Semijoin Algorithms for a Multiprocessor Database Machine
ACM Transactions on Database Systems (TODS)
Parallel algorithms for the execution of relational database operations
ACM Transactions on Database Systems (TODS)
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Hashing Methods and Relational Algebra Operations
VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Join processing in relational databases
ACM Computing Surveys (CSUR)
Processing multi-join query in parallel systems
SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
A Symmetric Fragment and Replicate Algorithm for Distributed Joinsyout
IEEE Transactions on Parallel and Distributed Systems
On the optimality of strategies for multiple join
Journal of the ACM (JACM)
On the optimality of strategies for multiple joins
PODS '90 Proceedings of the ninth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Intensive Data Management in Parallel Systems: A Survey
Distributed and Parallel Databases
Optimal Secondary Storage Access Sequence for Performing Relational Join
IEEE Transactions on Knowledge and Data Engineering
Hash-Based and Index-Based Join Algorithms for Cube and Ring Connected Multicomputers
IEEE Transactions on Knowledge and Data Engineering
Parallel Hash-Based Join Algorithms for a Shared-Everything Environment
IEEE Transactions on Knowledge and Data Engineering
Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins
IEEE Transactions on Knowledge and Data Engineering
Optimization of Parallel Execution for Multi-Join Queries
IEEE Transactions on Knowledge and Data Engineering
The Adaptive-Hash Join Algorithm for a Hypercube Multicomputer
IEEE Transactions on Parallel and Distributed Systems
Encapsulation of Parallelism and Architecture-Independence in Extensible Database Query Execution
IEEE Transactions on Software Engineering
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Hash-Based Join Algorithms for Multiprocessor Computers
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Join algorithm costs revisited
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.01 |
The join operation is the most costly operation in relational database management systems. Distributed and parallel processing can effectively speed up the join operation. In this paper, we describe a number of highly parallel and pipelined multiprocessor join algorithms using sort-merge and hashing techniques. Among them, two algorithms are parallel and pipelined versions of traditional sort-merge join methods, two algorithms use both hashing and sort-merge techniques, and another two are variations of the hybrid hash join algorithms. The performance of those algorithms is evaluated analytically against a generic database machine architecture. The methodology used in the design and evaluation of these algorithms is also discussed.The results of the analysis indicate that using a hashing technique to partition the source relations can dramatically reduce the elapsed time hash-based algorithms outperform sort-merge algorithms in almost all cases because of their high parallelism. Hash-based sort-merge and hybrid hash methods provide similar performance in most cases. With large source relations, the algorithms which replicate the smaller relation usually give better elapsed time. Sharing memory among processors also improves performance somewhat.