Sing the truth about ad hoc join costs

Authors:
Laura M. Haas;Michael J. Carey;Miron Livny;Amit Shukla
Affiliations:
IBM Almaden Research Center, K55/B1, 650 Harry Road, San Jose, CA 95120, USA;IBM Almaden Research Center, K55/B1, 650 Harry Road, San Jose, CA 95120, USA;Computer Sciences Dept., University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI 53706, USA;Computer Sciences Dept., University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI 53706, USA
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
1997

Citing 21
Cited 23

Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
R* optimizer validation and performance evaluation for local queries

SIGMOD '86 Proceedings of the 1986 ACM SIGMOD international conference on Management of data
Join indices

ACM Transactions on Database Systems (TODS)
Statistical profile estimation in database systems

ACM Computing Surveys (CSUR)
Merging sorted runs using large main memory

Acta Informatica
FastSort: a distributed single-input single-output external sort

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A performance evaluation of pointer-based joins

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On the propagation of errors in the size of join results

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Join processing in relational databases

ACM Computing Surveys (CSUR)
Mathematica: a system for doing mathematics by computer (2nd ed.)

Mathematica: a system for doing mathematics by computer (2nd ed.)
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Partially preemptible hash joins

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Tapes hold data, too: challenges of tuples on tertiary store

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Accurate modeling of the hybrid hash join algorithm

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Dynamic resource brokering for multi-user query execution

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Access path selection in a relational database management system

SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
A new way to compute the product and join of relations

SIGMOD '80 Proceedings of the 1980 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Sort vs. Hash Revisited

IEEE Transactions on Knowledge and Data Engineering
Hashing Methods and Relational Algebra Operations

VLDB '84 Proceedings of the 10th International Conference on Very Large Data Bases
An Observation on Database Buffering Performance Metrics

VLDB '86 Proceedings of the 12th International Conference on Very Large Data Bases

Memory-adaptive scheduling for large query execution

Proceedings of the seventh international conference on Information and knowledge management
Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Evaluating Functional Joins Along Nested Reference Sets in Object-Relational and Object-Oriented Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Generalised Hash Teams for Join and Group-by

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Efficient Snapshot Differential Algorithms for Data Warehousing

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Performance Analysis of Database Systems

Performance Evaluation: Origins and Directions
Functional-join processing

The VLDB Journal — The International Journal on Very Large Data Bases
Exploiting early sorting and early partitioning for decision support query processing

The VLDB Journal — The International Journal on Very Large Data Bases
Fast joins using join indices

The VLDB Journal — The International Journal on Very Large Data Bases
GhostDB: querying visible and hidden data without leaks

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Revelation on demand

Distributed and Parallel Databases
Query simplification: graceful degradation for join-order optimization

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Join processing for flash SSDs: remembering past lessons

Proceedings of the Fifth International Workshop on Data Management on New Hardware
Preventing bad plans by bounding the impact of cardinality estimation errors

Proceedings of the VLDB Endowment
Histograms reloaded: the merits of bucket diversity

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Quantifying uncertainty in multi-dimensional cardinality estimations

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Secure personal data servers: a vision paper

Proceedings of the VLDB Endowment
New algorithms for join and grouping operations

Computer Science - Research and Development
Main memory implementations for binary grouping

XSym'05 Proceedings of the Third international conference on Database and XML Technologies
Modern B-Tree Techniques

Foundations and Trends in Databases
Counter strike: generic top-down join enumeration for hypergraphs

Proceedings of the VLDB Endowment
MILo-DB: a personal, secure and portable database machine

Distributed and Parallel Databases
Active and accelerated learning of cost models for optimizing scientific applications

VLDB '06 Proceedings of the 32nd international conference on Very large data bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we re-examine the results of prior work on methods for computing ad hoc joins. We develop a detailed cost model for predicting join algorithm performance, and we use the model to develop cost formulas for the major ad hoc join methods found in the relational database literature. We show that various pieces of “common wisdom” about join algorithm performance fail to hold up when analyzed carefully, and we use our detailed cost model to derive op timal buffer allocation schemes for each of the join methods examined here. We show that optimizing their buffer allocations can lead to large performance improvements, e.g., as much as a 400% improvement in some cases. We also validate our cost model's predictions by measuring an actual implementation of each join algorithm considered. The results of this work should be directly useful to implementors of relational query optimizers and query processing systems.