Effectiveness of Parallel Joins

Authors:
M. S. Lakshmi;P. S. Yu
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
1990

Citing 17
Cited 32

Allocating Independent Subtasks on Parallel Processors

IEEE Transactions on Software Engineering
Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
A Performance Comparison of Multimicro and Mainframe Database Architectures

IEEE Transactions on Software Engineering
Effect of skew on join performance in parallel architectures

DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Effectiveness of parallel processing database systems

Computer Systems Science and Engineering
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Join and Semijoin Algorithms for a Multiprocessor Database Machine

ACM Transactions on Database Systems (TODS)
Using Semi-Joins to Solve Relational Queries

Journal of the ACM (JACM)
Advanced Database Machine Architecture

Advanced Database Machine Architecture
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Tradeoffs Between Coupling Small and Large Processors for Transaction Processing

IEEE Transactions on Computers
Hashing Methods and Relational Algebra Operations

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Selectivity Estimation and Query Optimization in Large Databases with Highly Skewed Distribution of Column Values

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Benchmarking Database Systems A Systematic Approach

VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Theory, Volume 1, Queueing Systems

Theory, Volume 1, Queueing Systems

Join processing in relational databases

ACM Computing Surveys (CSUR)
On Workload Characterization of Relational Database Environments

IEEE Transactions on Software Engineering
Processing multi-join query in parallel systems

SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Using shared virtual memory for parallel join processing

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A Parallel Hash Join Algorithm for Managing Data Skew

IEEE Transactions on Parallel and Distributed Systems
Scheduling multiple queries on a parallel machine

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Accurate modeling of the hybrid hash join algorithm

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A Hierarchical Approach to Parallel Multiquery Scheduling

IEEE Transactions on Parallel and Distributed Systems
DB2 parallel edition

IBM Systems Journal
Performance study on optimal processor assignment in parallel relational databases

ICS '97 Proceedings of the 11th international conference on Supercomputing
Performance evaluation of processor allocation algorithms for parallel query execution

SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
The complexity of acyclic conjunctive queries

Journal of the ACM (JACM)
Organization of Parallel Query Processing in Multiprocessor Database Machines with Hierarchical Architecture

Programming and Computing Software
Site and Query Scheduling Policies in Multicomputer Database Systems

IEEE Transactions on Knowledge and Data Engineering
New Algorithms for Parallelizing Relational Database Joins in the Presence of Data Skew

IEEE Transactions on Knowledge and Data Engineering
Optimization of Parallel Execution for Multi-Join Queries

IEEE Transactions on Knowledge and Data Engineering
Parallel Star Join + DataIndexes: Efficient Query Processing in Data Warehouses and OLAP

IEEE Transactions on Knowledge and Data Engineering
A Parallel Sort Merge Join Algorithm for Managing Data Skew

IEEE Transactions on Parallel and Distributed Systems
Distributed Load Balancing for Parallel Main Memory Hash Join

IEEE Transactions on Parallel and Distributed Systems
Join and Data Redistribution Algorithms for Hypercubes

IEEE Transactions on Knowledge and Data Engineering
Frequency-adaptive join for shared nothing machines

Progress in computer research
Practical Skew Handling in Parallel Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
A Skew-insensitive Algorithm for Join and Multi-join Operations on Shared Nothing Machines

DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
The impact of load balancing to object-oriented query execution scheduling in parallel machine environment

Information Sciences—Informatics and Computer Science: An International Journal
Survey of Architectures of Parallel Database Systems

Programming and Computing Software
Handling data skew in parallel joins in shared-nothing systems

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
An efficient skew-insensitive algorithm for join processing on grid architectures

Proceedings of the fifth international workshop on High-level parallel programming and applications
An optimal skew-insensitive join and multi-join algorithm for distributed architectures

DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
An efficient equi-semi-join algorithm for distributed architectures

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
A new framework for join product skew

RED'10 Proceedings of the Third international conference on Resource Discovery
A modeling tool for workload analysis and performance tuning of parallel database applications

ADBIS'97 Proceedings of the First East-European conference on Advances in Databases and Information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The effectiveness of parallel processing of relational join operations is examined. The skew in the distribution of join attribute values and the stochastic nature of the task processing times are identified as the major factors that can affect the effective exploitation of parallelism. Expressions for the execution time of parallel hash join and semijoin are derived and their effectiveness analyzed. When many small processors are used in the parallel architecture, the skew can result in some processors becoming sources of bottleneck while other processors are being underutilized. Even in the absence of skew, the variations in the processing times of the parallel tasks belonging to a query can lead to high task synchronization delay and impact the maximum speedup achievable through parallel execution. For example, when the task processing time on each processor is exponential with the same mean, the speedup is proportional to P/ln(P) where P is the number of processors. Other factors such as memory size, communication bandwidth, etc., can lead to even lower speedup. These are quantified using analytical models.