Database machines and database management
Database machines and database management
On multisystem coupling through function request shipping
IEEE Transactions on Software Engineering
Optimal parallel merging and sorting without memory conflicts
IEEE Transactions on Computers
Resource allocation problems: algorithmic approaches
Resource allocation problems: algorithmic approaches
Optimal allocation of multiple class resources in computer systems
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Effectiveness of parallel processing database systems
Computer Systems Science and Engineering
Percentile finding algorithm for multiple sorted runs
VLDB '89 Proceedings of the 15th international conference on Very large data bases
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Join and Semijoin Algorithms for a Multiprocessor Database Machine
ACM Transactions on Database Systems (TODS)
A Fast Selection Algorithm and the Problem of Optimum Distribution of Effort
Journal of the ACM (JACM)
Comparative performance of parallel join algorithms
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Operating Systems Theory
Advanced Database Machine Architecture
Advanced Database Machine Architecture
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
A new way to compute the product and join of relations
SIGMOD '80 Proceedings of the 1980 ACM SIGMOD international conference on Management of data
Prototyping Bubba, A Highly Parallel Database System
IEEE Transactions on Knowledge and Data Engineering
The Gamma Database Machine Project
IEEE Transactions on Knowledge and Data Engineering
Effectiveness of Parallel Joins
IEEE Transactions on Knowledge and Data Engineering
System Issues in Parallel Sorting for Database Systems
Proceedings of the Sixth International Conference on Data Engineering
An Effective Algorithm for Parallelizing Hash Joins in the Presence of Data Skew
Proceedings of the Seventh International Conference on Data Engineering
Optimal Buffer Partitioning for the Nested Block Join Algorithm
Proceedings of the Seventh International Conference on Data Engineering
VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
On the development of a site selection optimizer for distributed and parallel database systems
CIKM '93 Proceedings of the second international conference on Information and knowledge management
A Parallel Hash Join Algorithm for Managing Data Skew
IEEE Transactions on Parallel and Distributed Systems
Scheduling multiple queries on a parallel machine
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A Hierarchical Approach to Parallel Multiquery Scheduling
IEEE Transactions on Parallel and Distributed Systems
Disk load balancing for video-on-demand systems
Multimedia Systems
The Maximum Factor Queue Length Batching Scheme for Video-on-Demand Systems
IEEE Transactions on Computers
New Algorithms for Parallelizing Relational Database Joins in the Presence of Data Skew
IEEE Transactions on Knowledge and Data Engineering
Optimization of Parallel Execution for Multi-Join Queries
IEEE Transactions on Knowledge and Data Engineering
Replication Algorithms in a Remote Caching Architecture
IEEE Transactions on Parallel and Distributed Systems
Information Sciences—Applications: An International Journal
ICDCS '01 Proceedings of the The 21st International Conference on Distributed Computing Systems
Handling data skew in parallel joins in shared-nothing systems
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs
Proceedings of the VLDB Endowment
Efficient outer join data skew handling in parallel DBMS
Proceedings of the VLDB Endowment
Tree balance and node allocation
IDEAS'97 Proceedings of the 1997 international conference on International database engineering and applications symposium
Hi-index | 0.01 |
A parallel sort-merge-join algorithm which uses a divide-and-conquer approach to address the data skew problem is proposed. The proposed algorithm adds an extra, low-cost scheduling phase to the usual sort, transfer, and join phases. During the schedulingphase, a parallelizable optimization algorithm, using the output of the sort phase,attempts to balance the load across the multiple processors in the subsequent joinphase. The algorithm naturally identifies the largest skew elements, and assigns each ofthem to an optimal number of processors. Assuming a Zipf-like distribution of data skew,the algorithm is demonstrated to achieve very good load balancing for the join phase, andis shown to be very robust relative, among other things, to the degree of data skew andthe total number of processors.