Database machines and database management
Database machines and database management
On multisystem coupling through function request shipping
IEEE Transactions on Software Engineering
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Effectiveness of parallel processing database systems
Computer Systems Science and Engineering
The effect of bucket size tuning in the dynamic hybrid GRACE hash join method
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Proceedings of the sixteenth international conference on Very large databases
Scheduling parallelizable tasks: putting it all on the shelf
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Approximate algorithms scheduling parallelizable tasks
SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
Adaptive access path selection for relational database systems
Computer Systems Science and Engineering
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Join and Semijoin Algorithms for a Multiprocessor Database Machine
ACM Transactions on Database Systems (TODS)
Performance evaluation of functional disk system with nonuniform data distribution
DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
An effective algorithm for parallelizing sort merge joins in the presence of data skew
DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Comparative performance of parallel join algorithms
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Advanced Database Machine Architecture
Advanced Database Machine Architecture
Prototyping Bubba, A Highly Parallel Database System
IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
Effectiveness of Parallel Joins
IEEE Transactions on Knowledge and Data Engineering
A Parallel Sort Merge Join Algorithm for Managing Data Skew
IEEE Transactions on Parallel and Distributed Systems
Limiting Factors of Join Performance on Parallel Processors
Proceedings of the Fifth International Conference on Data Engineering
Hashing Methods and Relational Algebra Operations
VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Hash-Partitioned Join Method Using Dynamic Destaging Strategy
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Scheduling multiple queries on a parallel machine
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
DASD dancing: a disk load balancing optimization scheme for video-on-demand computer systems
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Parallel Execution of Hash Joins in Parallel Databases
IEEE Transactions on Parallel and Distributed Systems
Performance study on optimal processor assignment in parallel relational databases
ICS '97 Proceedings of the 11th international conference on Supercomputing
Snowball: Scalable Storage on Networks of Workstations with Balanced Load
Distributed and Parallel Databases
Performance evaluation of processor allocation algorithms for parallel query execution
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
File Assignment in Parallel I/O Systems with Minimal Variance of Service Time
IEEE Transactions on Computers
An Adaptive Parallel Distributive Join Algorithm on a Cluster of Workstations
The Journal of Supercomputing
New Algorithms for Parallelizing Relational Database Joins in the Presence of Data Skew
IEEE Transactions on Knowledge and Data Engineering
Applying Segmented Right-Deep Trees to Pipelining Multiple Hash Joins
IEEE Transactions on Knowledge and Data Engineering
Criss-Cross Hash Joins: Design and Analysis
IEEE Transactions on Knowledge and Data Engineering
Information Sciences—Informatics and Computer Science: An International Journal
Scheduling malleable tasks with interdependent processing rates: Comments and observations
Discrete Applied Mathematics
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Presents a parallel hash join algorithm that is based on the concept of hierarchicalhashing, to address the problem of data skew. The proposed algorithm splits the usualhash phase into a hash phase and an explicit transfer phase, and adds an extrascheduling phase between these two. During the scheduling phase, a heuristicoptimization algorithm, using the output of the hash phase, attempts to balance the loadacross the multiple processors in the subsequent join phase. The algorithm naturallyidentifies the hash partitions with the largest skew values and splits them as necessary,assigning each of them to an optimal number of processors. Assuming for concreteness aZipf-like distribution of the values in the join column, a join phase which is CPU-bound,and a shared nothing environment, the algorithm is shown to achieve good join phaseload balancing, and to be robust relative to the degree of data skew and the totalnumber of processors. The overall speedup due to this algorithm is compared to someexisting parallel hash join methods. The proposed method does considerably better in high skew situations.