Early hash join: a configurable algorithm for the efficient and early production of join results

  • Authors:
  • Ramon Lawrence

  • Affiliations:
  • University of Iowa

  • Venue:
  • VLDB '05 Proceedings of the 31st international conference on Very large data bases
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Minimizing both the response time to produce the first few thousand results and the overall execution time is important for interactive querying. Current join algorithms either minimize the execution time at the expense of response time or minimize response time by producing results early without optimizing the total time. We present a hash-based join algorithm, called early hash join, which can be dynamically configured at any point during join processing to tradeoff faster production of results for overall execution time. We demonstrate that varying how inputs are read has a major effect on these two factors and provide formulas that allow an optimizer to calculate the expected rate of join output and the number of I/O operations performed using different input reading strategies. Experimental results show that early hash join performs significantly fewer I/O operations and executes faster than other early join algorithms, especially for one-to-many joins. Its overall execution time is comparable to standard hybrid hash join, but its response time is an order of magnitude faster. Thus, early hash join can replace hybrid hash join in any situation where a fast initial response time is beneficial without the penalty in overall execution time exhibited by other early join algorithms.