Parallel online aggregation in action

Authors:
Chengjie Qin;Florin Rusu
Affiliations:
UC Merced, Merced, CA;UC Merced, Merced, CA
Venue:
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Year:
2013

Citing 8
Cited 0

Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
CONTROL: continuous output and navigation technology with refinement on-line

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The DBO database system

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Turbo-charging estimate convergence in DBO

Proceedings of the VLDB Endowment
Distributed online aggregations

Proceedings of the VLDB Endowment
GLADE: big data analytics made easy

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Early accurate results for advanced analytics on MapReduce

Proceedings of the VLDB Endowment
Blink and it's done: interactive queries on very large data

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online aggregation provides continuous estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution, or can let the processing terminate and obtain the exact result. In this demonstration, we introduce a general framework for parallel online aggregation in which estimation does not incur overhead on top of the actual processing. We define a generic interface to express any estimation model that abstracts completely the execution details. We design multiple sampling-based estimators suited for parallel online aggregation and implement them inside the framework. Demonstration participants are shown how estimates to general SQL aggregation queries over terabytes of TPC-H data are generated during the entire processing. Due to parallel execution, the estimate converges to the correct result in a matter of seconds even for the most difficult queries. The behavior of the estimators is evaluated under different operating regimes of the distributed cluster used in the demonstration.