Sing the truth about ad hoc join costs

  • Authors:
  • Laura M. Haas;Michael J. Carey;Miron Livny;Amit Shukla

  • Affiliations:
  • IBM Almaden Research Center, K55/B1, 650 Harry Road, San Jose, CA 95120, USA;IBM Almaden Research Center, K55/B1, 650 Harry Road, San Jose, CA 95120, USA;Computer Sciences Dept., University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI 53706, USA;Computer Sciences Dept., University of Wisconsin-Madison, 1210 West Dayton Street, Madison, WI 53706, USA

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we re-examine the results of prior work on methods for computing ad hoc joins. We develop a detailed cost model for predicting join algorithm performance, and we use the model to develop cost formulas for the major ad hoc join methods found in the relational database literature. We show that various pieces of “common wisdom” about join algorithm performance fail to hold up when analyzed carefully, and we use our detailed cost model to derive op timal buffer allocation schemes for each of the join methods examined here. We show that optimizing their buffer allocations can lead to large performance improvements, e.g., as much as a 400% improvement in some cases. We also validate our cost model's predictions by measuring an actual implementation of each join algorithm considered. The results of this work should be directly useful to implementors of relational query optimizers and query processing systems.