Cost-based query scrambling for initial delays
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Dataflow query execution in a parallel main-memory environment
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Continuously adaptive continuous queries over streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
On producing join results early
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
RPJ: producing fast join results on streams through rate-based optimization
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Early hash join: a configurable algorithm for the efficient and early production of join results
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Characterizing and Exploiting Reference Locality in Data Stream Applications
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
SSDBM '06 Proceedings of the 18th International Conference on Scientific and Statistical Database Management
Progressive merge join: a generic and non-blocking sort-based join algorithm
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A stratified approach to progressive approximate joins
EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
Danaïdes: continuous and progressive complex queries on RSS feeds
DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Twig'n join: progressive query processing of multiple XML streams
DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications
Progressive high-dimensional similarity join
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
Progressive join algorithms are join algorithms that produce results incrementally as input data is available. Because they are nonblocking, they are particularly suitable for online processing of data streams. Reference algorithms of this family are the symmetric hash join, the X-join and more recently, the rate-based progressive join (RPJ). While the symmetric hash join introduces the idea of a symmetric processing of the input streams but assumes sufficient main memory, the X-Join suggests that the processing can scale to very large amounts of data if main memory is regularly flushed to disk, and a reactive/cleanup phase is triggered for disk-resident data. The X-join flushing strategy is based on a simple largest-first strategy, where the largest partition is flushed to disk. The recently proposed RPJ predicts the main memory tuples or partitions that should be flushed to disk in order to maximize throughput by computing their probabilities to contribute to a result. In this paper, we discuss the limitations of RPJ and propose a novel extension, called Result Rate-based Progressive Join (RRPJ), which addresses these limitations. Instead of computing the probabilities from statistics over the input data, RRPJ directly observes the output (result) statistics. This not only yields a better performance, but also simplifies the generalization of the algorithm to non-relational data such as multidimensional data and hierarchical data. We empirically show that RRPJ is effective and efficient and outperforms the state-of-art RPJ. We also investigate the relevance and performance of an adaptive version of these algorithms using amortization parameters.