Accurate modeling of the hybrid hash join algorithm

Authors:
Jignesh M. Patel;Michael J. Carey;Mary K. Vernon
Affiliations:
Computer Sciences Department, University of Wisconsin, Madison;Computer Sciences Department, University of Wisconsin, Madison;Computer Sciences Department, University of Wisconsin, Madison
Venue:
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Year:
1994

Citing 18
Cited 16

Quantitative system performance: computer system analysis using queueing network models

Quantitative system performance: computer system analysis using queueing network models
Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
An accurate and efficient performance analysis technique for multiprocessor snooping cache-consistency protocols

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
The effect of bucket size tuning in the dynamic hybrid GRACE hash join method

VLDB '89 Proceedings of the 15th international conference on Very large data bases
An analytic model of multistage interconnection networks

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Experience with mean value analysis model for evaluating shared bus, throughput-oriented multiprocessors

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Parallel database systems: the future of high performance database systems

Communications of the ACM
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Mean-Value Analysis of Closed Multichain Queuing Networks

Journal of the ACM (JACM)
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The Gamma Database Machine Project

IEEE Transactions on Knowledge and Data Engineering
Effectiveness of Parallel Joins

IEEE Transactions on Knowledge and Data Engineering
Hashing Methods and Relational Algebra Operations

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
Disk Shadowing

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
Using Segmented Right-Deep Trees for the Execution of Pipelined Hash Joins

VLDB '92 Proceedings of the 18th International Conference on Very Large Data Bases
Managing Memory to Meet Multiclass Workload Response Time Goals

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Dynamic Memory Allocation for Multiple-Query Workloads

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases

Performance tradeoffs for client-server query processing

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
AMVA techniques for high service time variability

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Evaluating Functional Joins Along Nested Reference Sets in Object-Relational and Object-Oriented Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Towards Automated Performance Tuning for Complex Workloads

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Mean Value Analysis: a Personal Account

Performance Evaluation: Origins and Directions
Functional-join processing

The VLDB Journal — The International Journal on Very Large Data Bases
Sing the truth about ad hoc join costs

The VLDB Journal — The International Journal on Very Large Data Bases
Improving query I/O performance by permuting and refining block request sequences

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Identifying robust plans through plan diagram reduction

Proceedings of the VLDB Endowment
Exploiting pipeline interruptions for efficient memory allocation

Proceedings of the 17th ACM conference on Information and knowledge management
Performance improvement of join queries through algebraic signatures

International Journal of Intelligent Information and Database Systems
ONE: a predictable and scalable DW model

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
A predictable storage model for scalable parallel DW

Proceedings of the 15th Symposium on International Database Engineering & Applications
TEEPA: a timely-aware elastic parallel architecture

Proceedings of the 16th International Database Engineering & Applications Sysmposium
Overcoming the scalability limitations of parallel star schema data warehouses

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Providing timely results with an elastic parallel DW

ISMIS'12 Proceedings of the 20th international conference on Foundations of Intelligent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The join of two relations is an important operation in database systems. It occurs frequently in relational queries, and join performance is a significant factor in overall system performance. Cost models for join algorithms are used by query optimizers to choose efficient query execution strategies. This paper presents an efficient analytical model of an important join method, the hybrid hash join algorithm, that captures several key features of the algorithm's performance—including its intra-operator parallelism, interference between disk reads and writes, caching of disk pages, and placement of data on disk(s). Validation of the model against a detailed simulation of a database system shows that the response time estimates produced by the model are quite accurate.