Early hash join: a configurable algorithm for the efficient and early production of join results

Authors:
Ramon Lawrence
Affiliations:
University of Iowa
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 22
Cited 12

Optimization of parallel query execution plans in XPRS

Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
Cost-based query scrambling for initial delays

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Dataflow query execution in a parallel main-memory environment

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Rate-based query optimization for streaming information sources

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A scalable hash ripple join algorithm

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Reducing the Braking Distance of an SQL Query Engine

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Online Dynamic Reordering for Interactive Data Processing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Hash-Partitioned Join Method Using Dynamic Destaging Strategy

VLDB '88 Proceedings of the 14th International Conference on Very Large Data Bases
MJoin: a metadata-aware stream join operator

Proceedings of the 2nd international workshop on Distributed event-based systems
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Online maintenance of very large random samples

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Toward a progress indicator for database queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Streaming queries over streaming data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Progressive merge join: a generic and non-blocking sort-based join algorithm

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Maximizing the output rate of multi-way join queries over streaming information sources

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Lachesis: robust database storage management based on device-specific performance characteristics

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Supporting top-K join queries in relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
GridDB: a data-centric overlay for scientific grids

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

The effect of reading policy on early join result production

Information Sciences: an International Journal
Using slice join for efficient evaluation of multi-way joins

Data & Knowledge Engineering
Exploiting join cardinality for faster hash joins

Proceedings of the 2009 ACM symposium on Applied Computing
Using intrinsic data skew to improve hash join performance

Information Systems
RRPJ: result-rate based progressive relational join

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
PR-join: a non-blocking join achieving higher early result rate with statistical guarantees

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
R-MESHJOIN for near-real-time data warehousing

DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
Achieving high output quality under limited resources through structure-based spilling in XML streams

Proceedings of the VLDB Endowment
Resource optimization for processing of stream data in data warehouse environment

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
HYBRIDJOIN for Near-Real-Time Data Warehousing

International Journal of Data Warehousing and Mining
Optimised X-HYBRIDJOIN for near-real-time data warehousing

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
A generic front-stage for semi-stream processing

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Minimizing both the response time to produce the first few thousand results and the overall execution time is important for interactive querying. Current join algorithms either minimize the execution time at the expense of response time or minimize response time by producing results early without optimizing the total time. We present a hash-based join algorithm, called early hash join, which can be dynamically configured at any point during join processing to tradeoff faster production of results for overall execution time. We demonstrate that varying how inputs are read has a major effect on these two factors and provide formulas that allow an optimizer to calculate the expected rate of join output and the number of I/O operations performed using different input reading strategies. Experimental results show that early hash join performs significantly fewer I/O operations and executes faster than other early join algorithms, especially for one-to-many joins. Its overall execution time is comparable to standard hybrid hash join, but its response time is an order of magnitude faster. Thus, early hash join can replace hybrid hash join in any situation where a fast initial response time is beneficial without the penalty in overall execution time exhibited by other early join algorithms.