Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results

  • Authors:
  • Mohamed F. Mokbel;Ming Lu;Walid G. Aref

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces the hash-merge join algorithm(HMJ, for short); a new non-blocking join algorithm thatdeals with data items from remote sources via unpredictable,slow, or bursty network traffic. The HMJ algorithmis designed with two goals in mind: (1) Minimize thetime to produce the first few results, and (2) Produce joinresults even if the two sources of the join operator occasionallyget blocked. The HMJ algorithm has two phases: Thehashing phase and the merging phase. The hashing phaseemploys an in-memory hash-based join algorithm that producesjoin results as quickly as data arrives. The mergingphase is responsible for producing join results if the twosources are blocked. Both phases of the HMJ algorithmare connected via a flushing policy that flushes in-memoryparts into disk storage once the memory is exhausted. Experimentalresults show that HMJ combines the advantagesof two state-of-the-art non-blocking join algorithms (XJoinand Progressive Merge Join) while avoiding their short-comings.