Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results

Authors:
Mohamed F. Mokbel;Ming Lu;Walid G. Aref
Affiliations:
-;-;-
Venue:
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Year:
2004

Citing 20
Cited 24

Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Using integrity constraints to provide intensional answers to relational queries

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Join processing in relational databases

ACM Computing Surveys (CSUR)
Parallel database systems: the future of high performance database systems

Communications of the ACM
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Processing queries for first-few answers

CIKM '96 Proceedings of the fifth international conference on Information and knowledge management
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Heavy-tailed probability distributions in the World Wide Web

A practical guide to heavy tails
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Scrambling query plans to cope with unexpected delays

DIS '96 Proceedings of the fourth international conference on on Parallel and distributed information systems
Dataflow query execution in a parallel main-memory environment

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
A scalable hash ripple join algorithm

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
APPROXIMATE: A Query Processor that Produces Monotonically Improving Approximate Answers

IEEE Transactions on Knowledge and Data Engineering
On producing join results early

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Non-Blocking Parallel Spatial Join Algorithm

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Progressive merge join: a generic and non-blocking sort-based join algorithm

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

SINA: scalable incremental processing of continuous queries in spatio-temporal databases

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Evaluating window joins over punctuated streams

Proceedings of the thirteenth ACM international conference on Information and knowledge management
RPJ: producing fast join results on streams through rate-based optimization

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Early hash join: a configurable algorithm for the efficient and early production of join results

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Run-time operator state spilling for memory intensive long-running queries

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
NSJ: an efficient non-blocking spatial join algorithm

GIS '06 Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
The Sort-Merge-Shrink join

ACM Transactions on Database Systems (TODS)
The effect of reading policy on early join result production

Information Sciences: an International Journal
Request Window: an approach to improve throughput of RDBMS-based data integration system by utilizing data sharing across concurrent distributed queries

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
A stratified approach to progressive approximate joins

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
FMware: middleware for efficient filtering and matching of XML messages with local data

Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
Efficient join processing on uncertain data streams

Proceedings of the 18th ACM conference on Information and knowledge management
RRPJ: result-rate based progressive relational join

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
PR-join: a non-blocking join achieving higher early result rate with statistical guarantees

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Processing exact results for sliding window joins over data streams using disk storage

International Journal of Intelligent Information and Database Systems
R-MESHJOIN for near-real-time data warehousing

DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
A disk-based, adaptive approach to memory-limited computation of windowed stream joins

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part I
Achieving high output quality under limited resources through structure-based spilling in XML streams

Proceedings of the VLDB Endowment
SIHJoin: querying remote and local linked data

ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
FMware: middleware for efficient filtering and matching of XML messages with local data

Middleware'06 Proceedings of the 7th ACM/IFIP/USENIX international conference on Middleware
Phenomenon-aware sensor database systems

EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Progressive high-dimensional similarity join

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
HYBRIDJOIN for Near-Real-Time Data Warehousing

International Journal of Data Warehousing and Mining
Optimised X-HYBRIDJOIN for near-real-time data warehousing

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper introduces the hash-merge join algorithm(HMJ, for short); a new non-blocking join algorithm thatdeals with data items from remote sources via unpredictable,slow, or bursty network traffic. The HMJ algorithmis designed with two goals in mind: (1) Minimize thetime to produce the first few results, and (2) Produce joinresults even if the two sources of the join operator occasionallyget blocked. The HMJ algorithm has two phases: Thehashing phase and the merging phase. The hashing phaseemploys an in-memory hash-based join algorithm that producesjoin results as quickly as data arrives. The mergingphase is responsible for producing join results if the twosources are blocked. Both phases of the HMJ algorithmare connected via a flushing policy that flushes in-memoryparts into disk storage once the memory is exhausted. Experimentalresults show that HMJ combines the advantagesof two state-of-the-art non-blocking join algorithms (XJoinand Progressive Merge Join) while avoiding their short-comings.