Proceedings of the sixteenth international conference on Very large databases
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Deterministic distribution sort in shared and distributed memory multiprocessors
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Analytic Modeling and Comparisons of Striping Strategies for Replicated Disk Arrays
IEEE Transactions on Computers
New Algorithms for Parallelizing Relational Database Joins in the Presence of Data Skew
IEEE Transactions on Knowledge and Data Engineering
Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared Memory Multiprocessor
VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins
VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Dynamic Multi-Resource Load Balancing in Parallel Database Systems
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Dynamic Load Balancing in Parallel Database Systems
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Data partitioning and load balancing in parallel disk systems
The VLDB Journal — The International Journal on Very Large Data Bases
Multi-Dimensional Database Allocation for Parallel Data Warehouses
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Classification of Skew Effects in Parallel Database Systems
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Hi-index | 0.00 |
For complex queries in parallel database systems, substantial amounts of data must be redistributed between operators executed on different processing nodes. Frequently, such intermediate results cannot be held in main memory and must be stored on disk. To limit the ensuing performance penalty, a data allocation must be found that supports parallel I/O to the greatest possible extent.In this paper, we propose declustering even self-contained units of temporary data processed in a single operation (such as individual buckets of parallel hash joins) across multiple disks. Using a suitable analytical model, we find that the improvement of parallel I/O outweighs the penalty of increased fragmentation.