On Disk Allocation of Intermediate Query Results in Parallel Database Systems

Authors:
Holger Märtens
Affiliations:
-
Venue:
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Year:
1999

Citing 12
Cited 2

Bucket spreading parallel hash: a new, robust, parallel hash join method for data skew in the super database computer (SDC)

Proceedings of the sixteenth international conference on Very large databases
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Deterministic distribution sort in shared and distributed memory multiprocessors

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
An introduction to disk drive modeling

Computer
Analytic Modeling and Comparisons of Striping Strategies for Replicated Disk Arrays

IEEE Transactions on Computers
New Algorithms for Parallelizing Relational Database Joins in the Presence of Data Skew

IEEE Transactions on Knowledge and Data Engineering
Performance Analysis of a Load Balancing Hash-Join Algorithm for a Shared Memory Multiprocessor

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Practical Skew Handling in Parallel Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
A Performance Study of Workfile Disk Management for Concurrent Mergesorts in a Multiprocessor Database System

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Dynamic Multi-Resource Load Balancing in Parallel Database Systems

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Dynamic Load Balancing in Parallel Database Systems

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Data partitioning and load balancing in parallel disk systems

The VLDB Journal — The International Journal on Very Large Data Bases

Multi-Dimensional Database Allocation for Parallel Data Warehouses

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
A Classification of Skew Effects in Parallel Database Systems

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

For complex queries in parallel database systems, substantial amounts of data must be redistributed between operators executed on different processing nodes. Frequently, such intermediate results cannot be held in main memory and must be stored on disk. To limit the ensuing performance penalty, a data allocation must be found that supports parallel I/O to the greatest possible extent.In this paper, we propose declustering even self-contained units of temporary data processed in a single operation (such as individual buckets of parallel hash joins) across multiple disks. Using a suitable analytical model, we find that the improvement of parallel I/O outweighs the penalty of increased fragmentation.