Allocating Independent Subtasks on Parallel Processors
IEEE Transactions on Software Engineering
Join processing in database systems with large main memories
ACM Transactions on Database Systems (TODS)
A Performance Comparison of Multimicro and Mainframe Database Architectures
IEEE Transactions on Software Engineering
Effect of skew on join performance in parallel architectures
DPDS '88 Proceedings of the first international symposium on Databases in parallel and distributed systems
Effectiveness of parallel processing database systems
Computer Systems Science and Engineering
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Join and Semijoin Algorithms for a Multiprocessor Database Machine
ACM Transactions on Database Systems (TODS)
Using Semi-Joins to Solve Relational Queries
Journal of the ACM (JACM)
Advanced Database Machine Architecture
Advanced Database Machine Architecture
Access path selection in a relational database management system
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Tradeoffs Between Coupling Small and Large Processors for Transaction Processing
IEEE Transactions on Computers
Hashing Methods and Relational Algebra Operations
VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Benchmarking Database Systems A Systematic Approach
VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Theory, Volume 1, Queueing Systems
Theory, Volume 1, Queueing Systems
Join processing in relational databases
ACM Computing Surveys (CSUR)
On Workload Characterization of Relational Database Environments
IEEE Transactions on Software Engineering
Processing multi-join query in parallel systems
SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Using shared virtual memory for parallel join processing
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A Parallel Hash Join Algorithm for Managing Data Skew
IEEE Transactions on Parallel and Distributed Systems
Scheduling multiple queries on a parallel machine
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Accurate modeling of the hybrid hash join algorithm
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A Hierarchical Approach to Parallel Multiquery Scheduling
IEEE Transactions on Parallel and Distributed Systems
IBM Systems Journal
Performance study on optimal processor assignment in parallel relational databases
ICS '97 Proceedings of the 11th international conference on Supercomputing
Performance evaluation of processor allocation algorithms for parallel query execution
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
The complexity of acyclic conjunctive queries
Journal of the ACM (JACM)
Programming and Computing Software
Site and Query Scheduling Policies in Multicomputer Database Systems
IEEE Transactions on Knowledge and Data Engineering
New Algorithms for Parallelizing Relational Database Joins in the Presence of Data Skew
IEEE Transactions on Knowledge and Data Engineering
Optimization of Parallel Execution for Multi-Join Queries
IEEE Transactions on Knowledge and Data Engineering
Parallel Star Join + DataIndexes: Efficient Query Processing in Data Warehouses and OLAP
IEEE Transactions on Knowledge and Data Engineering
A Parallel Sort Merge Join Algorithm for Managing Data Skew
IEEE Transactions on Parallel and Distributed Systems
Distributed Load Balancing for Parallel Main Memory Hash Join
IEEE Transactions on Parallel and Distributed Systems
Join and Data Redistribution Algorithms for Hypercubes
IEEE Transactions on Knowledge and Data Engineering
Frequency-adaptive join for shared nothing machines
Progress in computer research
Practical Skew Handling in Parallel Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
A Skew-insensitive Algorithm for Join and Multi-join Operations on Shared Nothing Machines
DEXA '00 Proceedings of the 11th International Conference on Database and Expert Systems Applications
Information Sciences—Informatics and Computer Science: An International Journal
Survey of Architectures of Parallel Database Systems
Programming and Computing Software
Handling data skew in parallel joins in shared-nothing systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
An efficient skew-insensitive algorithm for join processing on grid architectures
Proceedings of the fifth international workshop on High-level parallel programming and applications
An optimal skew-insensitive join and multi-join algorithm for distributed architectures
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
An efficient equi-semi-join algorithm for distributed architectures
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
A new framework for join product skew
RED'10 Proceedings of the Third international conference on Resource Discovery
A modeling tool for workload analysis and performance tuning of parallel database applications
ADBIS'97 Proceedings of the First East-European conference on Advances in Databases and Information systems
Hi-index | 0.00 |
The effectiveness of parallel processing of relational join operations is examined. The skew in the distribution of join attribute values and the stochastic nature of the task processing times are identified as the major factors that can affect the effective exploitation of parallelism. Expressions for the execution time of parallel hash join and semijoin are derived and their effectiveness analyzed. When many small processors are used in the parallel architecture, the skew can result in some processors becoming sources of bottleneck while other processors are being underutilized. Even in the absence of skew, the variations in the processing times of the parallel tasks belonging to a query can lead to high task synchronization delay and impact the maximum speedup achievable through parallel execution. For example, when the task processing time on each processor is exponential with the same mean, the speedup is proportional to P/ln(P) where P is the number of processors. Other factors such as memory size, communication bandwidth, etc., can lead to even lower speedup. These are quantified using analytical models.