SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Adaptive parallel aggregation algorithms
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Selectivity and cost estimation for joins based on random sampling
Journal of Computer and System Sciences
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An adaptive query execution system for data integration
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Online Feedback for Nested Aggregate Queries with Multi-Threading
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Benchmarking Database Systems A Systematic Approach
VLDB '83 Proceedings of the 9th International Conference on Very Large Data Bases
Large-Sample and Deterministic Confidence Intervals for Online Aggregation
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Consistent database sampling as a database prototyping approach
Journal of Software Maintenance: Research and Practice
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
A disk-based join with probabilistic guarantees
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Early hash join: a configurable algorithm for the efficient and early production of join results
VLDB '05 Proceedings of the 31st international conference on Very large data bases
NSJ: an efficient non-blocking spatial join algorithm
GIS '06 Proceedings of the 14th annual ACM international symposium on Advances in geographic information systems
ACM Transactions on Database Systems (TODS)
Online Random Shuffling of Large Database Tables
IEEE Transactions on Knowledge and Data Engineering
Scalable approximate query processing with the DBO engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
The effect of reading policy on early join result production
Information Sciences: an International Journal
A transducer-based XML query processor
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Scalable approximate query processing with the DBO engine
ACM Transactions on Database Systems (TODS)
A Vision for Next Generation Query Processors and an Associated Research Agenda
Globe '09 Proceedings of the 2nd International Conference on Data Management in Grid and Peer-to-Peer Systems
Turbo-charging estimate convergence in DBO
Proceedings of the VLDB Endowment
A formal framework for database sampling
Information and Software Technology
PR-join: a non-blocking join achieving higher early result rate with statistical guarantees
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Continuous sampling for online aggregation over multiple queries
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
The case for object databases in cloud data management
ICOODB'10 Proceedings of the Third international conference on Objects and databases
Improving online aggregation performance for skewed data distribution
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
You can stop early with COLA: online processing of aggregate queries in the cloud
Proceedings of the 21st ACM international conference on Information and knowledge management
Processing online aggregation on skewed data in mapreduce
Proceedings of the fifth international workshop on Cloud data management
Sampling estimators for parallel online aggregation
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Hi-index | 0.00 |
Recently, Haas and Hellerstein proposed the hash ripple join algorithm in the context of online aggregation. Although the algorithm rapidly gives a good estimate for many join-aggregate problem instances, the convergence can be slow if the number of tuples that satisfy the join predicate is small or if there are many groups in the output. Furthermore, if memory overflows (for example, because the user allows the algorithm to run to completion for an exact answer), the algorithm degenerates to block ripple join and performance suffers. In this paper, we build on the work of Haas and Hellerstein and propose a new algorithm that (a) combines parallelism with sampling to speed convergence, and (b) maintains good performance in the presence of memory overflow. Results from a prototype implementation in a parallel DBMS show that its rate of convergence scales with the number of processors, and that when allowed to run to completion, even in the presence of memory overflow, it is competitive with the traditional parallel hybrid hash join algorithm.