Join processing in database systems with large main memories
ACM Transactions on Database Systems (TODS)
A Performance Comparison of Multimicro and Mainframe Database Architectures
IEEE Transactions on Software Engineering
Join and Semijoin Algorithms for a Multiprocessor Database Machine
ACM Transactions on Database Systems (TODS)
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Limiting Factors of Join Performance on Parallel Processors
Proceedings of the Fifth International Conference on Data Engineering
Hashing Methods and Relational Algebra Operations
VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
GAMMA - A High Performance Dataflow Database Machine
VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases
Why a single parallelization strategy is not enough in knowledge bases
PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Join processing in relational databases
ACM Computing Surveys (CSUR)
Frame-sliced partitioned parallel signature files
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Dynamic Load Balancing in Very Large Shared-Nothing Hypercube Database Computers
IEEE Transactions on Computers
An effective algorithm for parallelizing sort merge joins in the presence of data skew
DPDS '90 Proceedings of the second international symposium on Databases in parallel and distributed systems
Considering data skew factor in multi-way join query optimization for parallel execution
The VLDB Journal — The International Journal on Very Large Data Bases - Parallelism in database systems
Effectiveness of Parallel Joins
IEEE Transactions on Knowledge and Data Engineering
A Graph Theoretical Approach to Determine a Join Reducer Sequence in Distributed Query Processing
IEEE Transactions on Knowledge and Data Engineering
Dynamic Load Balancing in Multicomputer Database Systems Using Partition Tuning
IEEE Transactions on Knowledge and Data Engineering
The Adaptive-Hash Join Algorithm for a Hypercube Multicomputer
IEEE Transactions on Parallel and Distributed Systems
A Virtual Bus Architecture for Dynamic Parallel Processing
IEEE Transactions on Parallel and Distributed Systems
Performance Analysis of Disk Arrays under Failure
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
An Adaptive Data Placement Scheme for Parallel Database Computer Systems
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Hash-Based Join Algorithms for Multiprocessor Computers
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
VLDB '90 Proceedings of the 16th International Conference on Very Large Data Bases
Handling Data Skew in Multiprocessor Database Computers Using Partition Tuning
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
A Taxonomy and Performance Model of Data Skew Effects in Parallel Joins
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared Memory Multiprocessor
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Estimation of Query-Result Distribution and its Application in Parallel-Join Load Balancing
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Query processing in a DBMS for cluster systems
Programming and Computing Software
Query evaluation techniques for cluster database systems
ADBIS'10 Proceedings of the 14th east European conference on Advances in databases and information systems
Introducing skew into the TPC-H benchmark
TPCTC'11 Proceedings of the Third TPC Technology conference on Topics in Performance Evaluation, Measurement and Characterization
Variations of the star schema benchmark to test the effects of data skew on query performance
Proceedings of the 4th ACM/SPEC International Conference on Performance Engineering
Hi-index | 0.00 |
Skew in the distribution of values taken by an attribute is identified as a major factor that can affect the performance of parallel architectures for relational joins. The effect of skew on the performance of two parallel architectures is evaluated using analytic models. In one architecture, called database machine (DBMC), data as well as processing power are distributed; while in the other architecture, called Single Processor Parallel Input/output (SPPI), data is distributed but the processing power is concentrated in one processor. The two architectures are compared in terms of the ratio of MIPS used by DBMC and SPPI to deliver the same throughput and response time. In addition, the horizontal growth potential of DBMC is evaluated in terms of maximum speedup achievable by DBMC relative to SPPI response time. The MIPS ratio as well as speedup are found to be very sensitive to the amount of skew. These suggest, careful thought should be given in parallelizing database applications and in the design of algorithms and query optimizer for parallel architectures.