Join and Semijoin Algorithms for a Multiprocessor Database Machine

Authors:
Patrick Valduriez;Georges Gardarin
Affiliations:
INRIA and University of Paris VI;INRIA and University of Paris VI
Venue:
ACM Transactions on Database Systems (TODS)
Year:
1984

Citing 11
Cited 74

The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Implementing a relational database by means of specialzed hardware

ACM Transactions on Database Systems (TODS)
Introduction to a system for distributed databases (SDD-1)

ACM Transactions on Database Systems (TODS)
Concepts and capabilities of a database computer\

ACM Transactions on Database Systems (TODS)
The design and implementation of INGRES

ACM Transactions on Database Systems (TODS)
Performance evaluation of a relational associative processor

ACM Transactions on Database Systems (TODS)
A relational model of data for large shared data banks

Communications of the ACM
Computing joins of relations

SIGMOD '75 Proceedings of the 1975 ACM SIGMOD international conference on Management of data
Query execution in DIRECT

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Design of a backend processor for a data base machine

SIGMOD '80 Proceedings of the 1980 ACM SIGMOD international conference on Management of data
Design considerations for data-flow database machines

SIGMOD '80 Proceedings of the 1980 ACM SIGMOD international conference on Management of data

Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
A state transition model for distributed query processing

ACM Transactions on Database Systems (TODS)
Query processing in main memory database management systems

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Join indices

ACM Transactions on Database Systems (TODS)
Design and evaluation of parallel pipelined join algorithms

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Incomplete information and the join operation in database machines

ACM '87 Proceedings of the 1987 Fall Joint Computer Conference on Exploring technology: today and tomorrow
Petri-Net-Based Modeling and Evaluation of Pipelined Processing of Concurrent Database Queries

IEEE Transactions on Software Engineering
The Join Algorithms on a Shared-Memory Multiprocessor Database Machine

IEEE Transactions on Software Engineering
Multiprocessor transitive closure algorithms

DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Parallel join algorithms on a network of workstations

DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Effect of skew on join performance in parallel architectures

DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Percentile finding algorithm for multiple sorted runs

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Optimizing equijoin queries in distributed databases where relations are hash partitioned

ACM Transactions on Database Systems (TODS)
Nomenclator descriptive query optimization for large X.500 environments

SIGCOMM '91 Proceedings of the conference on Communications architecture & protocols
Join processing in relational databases

ACM Computing Surveys (CSUR)
Performance comparison of join on hypercube and mesh

CSC '92 Proceedings of the 1992 ACM annual conference on Communications
Performance of join on an n-dimensional mesh

SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
Processing multi-join query in parallel systems

SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
Distributive join algorithm for shared-memory multiprocessors

SAC '93 Proceedings of the 1993 ACM/SIGAPP symposium on Applied computing: states of the art and practice
A Parallel Hash Join Algorithm for Managing Data Skew

IEEE Transactions on Parallel and Distributed Systems
A new join algorithm

ACM SIGMOD Record
A Parallel Distributive Join Algorithm for Cube-Connected Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Multiway Merging in Parallel

IEEE Transactions on Parallel and Distributed Systems
A Parallel Scheme Using the Divide-and-Conquer Method

Distributed and Parallel Databases
An experimental performance study of a pipelined recursive query processing strategy

DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Multi-join on parallel processors

DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
An effective algorithm for parallelizing sort merge joins in the presence of data skew

DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
A heuristic algorithm for partition strategy in distributed query processing

SAC '96 Proceedings of the 1996 ACM symposium on Applied Computing
Query Optimization in Database Systems

ACM Computing Surveys (CSUR)
The state of the art in distributed query processing

ACM Computing Surveys (CSUR)
Design and implementation of an extendible integrity subsystem

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
A multikey hashing scheme using predicate trees

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Load Balancing for Parallel Query Execution on NUMA Multiprocessors

Distributed and Parallel Databases
Parallel query processing with zigzag trees

The VLDB Journal — The International Journal on Very Large Data Bases - Parallelism in database systems
Hash-Based and Index-Based Join Algorithms for Cube and Ring Connected Multicomputers

IEEE Transactions on Knowledge and Data Engineering
Effectiveness of Parallel Joins

IEEE Transactions on Knowledge and Data Engineering
Parallel Hash-Based Join Algorithms for a Shared-Everything Environment

IEEE Transactions on Knowledge and Data Engineering
Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins

IEEE Transactions on Knowledge and Data Engineering
On the Complexity of Distributed Query Optimization

IEEE Transactions on Knowledge and Data Engineering
Performance Analysis of Parallel Query Processing Algorithms for Object-Oriented Databases

IEEE Transactions on Knowledge and Data Engineering
The Adaptive-Hash Join Algorithm for a Hypercube Multicomputer

IEEE Transactions on Parallel and Distributed Systems
A Parallel Sort Merge Join Algorithm for Managing Data Skew

IEEE Transactions on Parallel and Distributed Systems
Work-Time Optimal k-Merge Algorithms on the PRAM

IEEE Transactions on Parallel and Distributed Systems
An Evaluation of Relational Join Algorithms in a Pipelined Query Processing Environment

IEEE Transactions on Software Engineering
Join and Data Redistribution Algorithms for Hypercubes

IEEE Transactions on Knowledge and Data Engineering
Near-Optimum Storage Models for Nested Relations Based on Workload Information

IEEE Transactions on Knowledge and Data Engineering
Hashing Methods and Relational Algebra Operations

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Towards DBMSs for Supporting New Applications

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
A Study of Sort Algorithms for Multiprocessor Database Machines

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
A Reliable Backend Using Multiattribute Clustering and Select-Join Operator

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
R* Optimizer Validation and Performance Evaluation for Distributed Queries

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Generalised Hash Teams for Join and Group-by

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Hash-Based Join Algorithms for Multiprocessor Computers

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
An Adaptive Hash Join Algorithm for Multiuser Environments

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Bucket Spreading Parallel Hash: A New, Robust, Parallel Hash Join Method for Data Skew in the Super Database Computer (SDC)

VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Optimization of Multi-Way Join Queries for Parallel Execution

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
On the Effectiveness of Optimization Search Strategies for Parallel Execution Spaces

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Applying Hash Filters to Improving the Execution of Bushy Trees

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Dynamic Load Balancing in Hierarchical Parallel Database Systems

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
A parallel hash-based join algorithm for a networked cluster of multiprocessor nodes

COMPSAC '97 Proceedings of the 21st International Computer Software and Applications Conference
On applying hash filters to improving the execution of multi-join queries

The VLDB Journal — The International Journal on Very Large Data Bases
Multiprocessor hash-based join algorithms

VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
Optimizing Distributed Joins with Bloom Filters

ICDCIT '08 Proceedings of the 5th International Conference on Distributed Computing and Internet Technology
Parallel Algorithms for the Execution of Relational Database Operations Revisited On Grids

International Journal of High Performance Computing Applications
Spinning relations: high-speed networks for distributed join processing

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Performance improvement of join queries through algebraic signatures

International Journal of Intelligent Information and Database Systems
Improving retouched Bloom filter for trading off selected false positives against false negatives

Computer Networks: The International Journal of Computer and Telecommunications Networking
Query optimization in database grid

GCC'05 Proceedings of the 4th international conference on Grid and Cooperative Computing
Merging data records on EREW PRAM

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
L-priorities bloom filter: A new member of the bloom filter family

International Journal of Automation and Computing
md5bloom: Forensic filesystem hashing revisited

Digital Investigation: The International Journal of Digital Forensics & Incident Response
Toward intersection filter-based optimization for joins in MapReduce

Proceedings of the 2nd International Workshop on Cloud Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents and analyzes algorithms for computing joins and semijoins of relations in a multiprocessor database machine. First, a model of the multiprocessor architecture is described, incorporating parameters defining I/O, CPU, and message transmission times that permit calculation of the execution times of these algorithms. Then, three join algorithms are presented and compared. It is shown that, for a given configuration, each algorithm has an application domain defined by the characteristics of the operand and result relations. Since a semijoin operator is useful for decreasing I/O and transmission times in a multiprocessor system, we present and compare two equi-semijoin algorithms and one non-equi-semijoin algorithm. The execution times of these algorithms are generally linearly proportional to the size of the operand and result relations, and inversely proportional to the number of processors. We then compare a method which consists of joining two relations to a method whereby one joins their semijoins. Finally, it is shown that the latter method, using semijoins, is generally better. The various algorithms presented are implemented in the SABRE database system; an evaluation model selects the best algorithm for performing a join according to the results presented here. A first version of the SABRE system is currently operational at INRIA.