The effect of reading policy on early join result production

Authors:
Ramon Lawrence;Ralph P. Russo;Nariankadu D. Shyamalkumar
Affiliations:
Department of Computer Science, University of British Columbia Okanagan, Canada;Department of Statistics and Actuarial Science, The University of Iowa, United States;Department of Statistics and Actuarial Science, The University of Iowa, United States
Venue:
Information Sciences: an International Journal
Year:
2007

Citing 21
Cited 2

Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
Join processing in relational databases

ACM Computing Surveys (CSUR)
Optimization of parallel query execution plans in XPRS

Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Dataflow query execution in a parallel main-memory environment

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Rate-based query optimization for streaming information sources

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A scalable hash ripple join algorithm

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Online Dynamic Reordering for Interactive Data Processing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On producing join results early

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Toward a progress indicator for database queries

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
RPJ: producing fast join results on streams through rate-based optimization

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Early hash join: a configurable algorithm for the efficient and early production of join results

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Streaming queries over streaming data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Progressive merge join: a generic and non-blocking sort-based join algorithm

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Maximizing the output rate of multi-way join queries over streaming information sources

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Supporting top-K join queries in relational databases

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
GridDB: a data-centric overlay for scientific grids

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Exploiting join cardinality for faster hash joins

Proceedings of the 2009 ACM symposium on Applied Computing
TJJE: An efficient algorithm for top-k join on massive data

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

The ability to produce join results before having read an entire input (early) reduces query response time. This is especially important for interactive applications, and for joins in mediator systems that may have to wait on network delays when reading the inputs. Although several early join algorithms have been proposed, there has been no formal treatment of how different reading policies affect the number of results produced. In this work, we show that alternate reading is optimal among fixed reading policies, and we provide expressions for the expected number of results produced over time. Further, we analyze policies that adapt their execution to the tuples already read and to the distribution of the inputs. We present a greedy, adaptive algorithm that is optimal in that it outperforms all reading policies, on average. However, the greedy policy is shown to perform only marginally better than the alternating policy. Thus, the alternating policy emerges as a policy that is easy to implement, requires no knowledge of the input distributions, is optimal among fixed policies, and is nearly optimal among all policies.