Turbo-charging estimate convergence in DBO

Authors:
Alin Dobra;Chris Jermaine;Florin Rusu;Fei Xu
Affiliations:
University of Florida;University of Florida and Rice University;University of Florida;University of Florida
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 16
Cited 4

Statistical estimators for aggregate relational algebra queries

ACM Transactions on Database Systems (TODS)
Error-constrained COUNT query evaluation in relational databases

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
A scalable hash ripple join algorithm

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Database Architecture Optimized for the New Bottleneck: Memory Access

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Large-Sample and Deterministic Confidence Intervals for Online Aggregation

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Online estimation for subset-based SQL queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Derby/S: a DBMS for sample-based query answering

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Scalable approximate query processing with the DBO engine

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Maximizing the output rate of multi-way join queries over streaming information sources

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A Bayesian method for guessing the extreme values in a data set?

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
The DBO database system

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Scalable approximate query processing with the DBO engine

ACM Transactions on Database Systems (TODS)

PR-join: a non-blocking join achieving higher early result rate with statistical guarantees

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Parallel online aggregation in action

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Sampling estimators for parallel online aggregation

BNCOD'13 Proceedings of the 29th British National conference on Big Data
A sampling algebra for aggregate estimation

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

DBO is a database system that utilizes randomized algorithms to give statistically meaningful estimates for the final answer to a multi-table, disk-based query from start to finish during query execution. However, DBO's "time 'til utility" (or "TTU"; that is, the time until DBO can give a useful estimate) can be overly large, particularly in the case that many database tables are joined in a query, or in the case that a join query includes a very selective predicate on one or more of the tables, or when the data are skewed. In this paper, we describe Turbo DBO, which is a prototype database system that can answer multi-table join queries in a scalable fashion, just like DBO. However, Turbo DBO often has a much lower TTU than DBO. The key innovation of Turbo DBO is that it makes use of novel algorithms that look for and remember "partial match" tuples in a randomized fashion. These are tuples that satisfy some of the boolean predicates associated with the query, and can possibly be grown into tuples that actually contribute to the final query result at a later time.