Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A scalable hash ripple join algorithm
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Informix under CONTROL: Online Query Processing
Data Mining and Knowledge Discovery
Implementation techniques for main memory database systems
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
RPJ: producing fast join results on streams through rate-based optimization
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Early hash join: a configurable algorithm for the efficient and early production of join results
VLDB '05 Proceedings of the 31st international conference on Very large data bases
ACM Transactions on Database Systems (TODS)
FlashDB: dynamic self-tuning database for NAND flash
Proceedings of the 6th international conference on Information processing in sensor networks
Progressive merge join: a generic and non-blocking sort-based join algorithm
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The five-minute rule twenty years later, and how flash memory changes the rules
DaMoN '07 Proceedings of the 3rd international workshop on Data management on new hardware
A case for flash memory ssd in enterprise database applications
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Scalable approximate query processing with the DBO engine
ACM Transactions on Database Systems (TODS)
Proceedings of the VLDB Endowment
Online maintenance of very large random samples on flash storage
Proceedings of the VLDB Endowment
Query processing techniques for solid state drives
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
FlashLogging: exploiting flash devices for synchronous logging performance
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
An object placement advisor for DB2 using solid state storage
Proceedings of the VLDB Endowment
Lazy-Adaptive Tree: an optimized index structure for flash devices
Proceedings of the VLDB Endowment
Turbo-charging estimate convergence in DBO
Proceedings of the VLDB Endowment
Driver input selection for main-memory multi-way joins
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Processing online aggregation on skewed data in mapreduce
Proceedings of the fifth international workshop on Cloud data management
Sampling estimators for parallel online aggregation
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Hi-index | 0.00 |
Online aggregation is a promising solution to achieving fast early responses for interactive ad-hoc queries that compute aggregates on a large amount of data. Essential to the success of online aggregation is a good non-blocking join algorithm that enables both (i) high early result rates with statistical guarantees and (ii) fast end-to-end query times. We analyze existing non-blocking join algorithms and find that they all provide sub-optimal early result rates, and those with fast end-to-end times achieve them only by further sacrificing their early result rates. We propose a new non-blocking join algorithm, Partitioned expanding Ripple Join (PR-Join), which achieves considerably higher early result rates than previous non-blocking joins, while also delivering fast end-to-end query times. PR-Join performs separate, ripple-like join operations on individual hash partitions, where the width of a ripple expands multiplicatively over time. This contrasts with the non-partitioned, fixed-width ripples of Block Ripple Join. Assuming, as in previous non-blocking join studies, that the input relations are in random order, PR-Join ensures representative early results that are amenable to statistical guarantees. We show both analytically and with real-machine experiments that PR-Join achieves over an order of magnitude higher early result rates than previous non-blocking joins. We also discuss the benefits of using a flash-based SSD for temporary storage, showing that PR-Join can then achieve close to optimal end-to-end performance. Finally, we consider the joining of finite data streams that arrive over time, and find that PR-Join achieves similar or higher result rates than RPJ, the state-of-the-art algorithm specialized for that domain.