SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
CONTROL: continuous output and navigation technology with refinement on-line
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Turbo-charging estimate convergence in DBO
Proceedings of the VLDB Endowment
Distributed online aggregations
Proceedings of the VLDB Endowment
GLADE: big data analytics made easy
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Early accurate results for advanced analytics on MapReduce
Proceedings of the VLDB Endowment
Blink and it's done: interactive queries on very large data
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
Online aggregation provides continuous estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution, or can let the processing terminate and obtain the exact result. In this demonstration, we introduce a general framework for parallel online aggregation in which estimation does not incur overhead on top of the actual processing. We define a generic interface to express any estimation model that abstracts completely the execution details. We design multiple sampling-based estimators suited for parallel online aggregation and implement them inside the framework. Demonstration participants are shown how estimates to general SQL aggregation queries over terabytes of TPC-H data are generated during the entire processing. Due to parallel execution, the estimate converges to the correct result in a matter of seconds even for the most difficult queries. The behavior of the estimators is evaluated under different operating regimes of the distributed cluster used in the demonstration.