Join processing in database systems with large main memories
ACM Transactions on Database Systems (TODS)
Join processing in relational databases
ACM Computing Surveys (CSUR)
Optimization of parallel query execution plans in XPRS
Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An adaptive query execution system for data integration
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Dataflow query execution in a parallel main-memory environment
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Rate-based query optimization for streaming information sources
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
A scalable hash ripple join algorithm
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Online Dynamic Reordering for Interactive Data Processing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
On producing join results early
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Toward a progress indicator for database queries
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
RPJ: producing fast join results on streams through rate-based optimization
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Early hash join: a configurable algorithm for the efficient and early production of join results
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Streaming queries over streaming data
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Progressive merge join: a generic and non-blocking sort-based join algorithm
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Maximizing the output rate of multi-way join queries over streaming information sources
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Supporting top-K join queries in relational databases
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
GridDB: a data-centric overlay for scientific grids
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Exploiting join cardinality for faster hash joins
Proceedings of the 2009 ACM symposium on Applied Computing
TJJE: An efficient algorithm for top-k join on massive data
Information Sciences: an International Journal
Hi-index | 0.07 |
The ability to produce join results before having read an entire input (early) reduces query response time. This is especially important for interactive applications, and for joins in mediator systems that may have to wait on network delays when reading the inputs. Although several early join algorithms have been proposed, there has been no formal treatment of how different reading policies affect the number of results produced. In this work, we show that alternate reading is optimal among fixed reading policies, and we provide expressions for the expected number of results produced over time. Further, we analyze policies that adapt their execution to the tuples already read and to the distribution of the inputs. We present a greedy, adaptive algorithm that is optimal in that it outperforms all reading policies, on average. However, the greedy policy is shown to perform only marginally better than the alternating policy. Thus, the alternating policy emerges as a policy that is easy to implement, requires no knowledge of the input distributions, is optimal among fixed policies, and is nearly optimal among all policies.