Join processing in database systems with large main memories
ACM Transactions on Database Systems (TODS)
Processing aggregate relational queries with hard time constraints
SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
VLDB '89 Proceedings of the 15th international conference on Very large data bases
Random sampling from hash files
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Bifocal sampling for skew-resistant join size estimation
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Selectivity and cost estimation for joins based on random sampling
Journal of Computer and System Sciences
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Statistical estimators for relational algebra expressions
Proceedings of the seventh ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A scalable hash ripple join algorithm
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Large-Sample and Deterministic Confidence Intervals for Online Aggregation
SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
On producing join results early
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Non-Blocking Parallel Spatial Join Algorithm
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Progressive merge join: a generic and non-blocking sort-based join algorithm
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
ACM Transactions on Database Systems (TODS)
Scalable approximate query processing with the DBO engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Confidence bounds for sampling-based group by estimates
ACM Transactions on Database Systems (TODS)
Scalable approximate query processing with the DBO engine
ACM Transactions on Database Systems (TODS)
Progressive Evaluation of XML Queries for Online Aggregation and Progress Indicator
DEXA '09 Proceedings of the 20th International Conference on Database and Expert Systems Applications
Distributed online aggregations
Proceedings of the VLDB Endowment
ParaTimer: a progress indicator for MapReduce DAGs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Continuous sampling for online aggregation over multiple queries
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Approximate query answering and result refinement on XML data
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
You can stop early with COLA: online processing of aggregate queries in the cloud
Proceedings of the 21st ACM international conference on Information and knowledge management
Optimizing Sample Design for Approximate Query Processing
International Journal of Knowledge-Based Organizations
Hi-index | 0.00 |
One of the most common operations in analytic query processing is the application of an aggregate function to the result of a relational join. We describe an algorithm for computing the answer to such a query over large, disk-based input tables. The key innovation of our algorithm is that at all times, it provides an online, statistical estimator for the eventual answer to the query, as well as probabilistic confidence bounds. Thus, a user can monitor the progress of the join throughout its execution and stop the join when satisfied with the estimate's accuracy, or run the algorithm to completion with a total time requirement that is not much longer than other common join algorithms. This contrasts with other online join algorithms, which either do not offer such statistical guarantees or can only offer guarantees so long as the input data can fit into core memory.