Online estimation for subset-based SQL queries

Authors:
Christopher Jermaine;Alin Dobra;Abhijit Pol;Shantanu Joshi
Affiliations:
University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL
Venue:
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Year:
2005

Citing 6
Cited 5

Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Feedforward Neural Network Methodology

Feedforward Neural Network Methodology
Interactive Data Analysis: The Control Project

Computer
Large-Sample and Deterministic Confidence Intervals for Online Aggregation

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Processing set expressions over continuous update streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data

Scalable approximate query processing with the DBO engine

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Scalable approximate query processing with the DBO engine

ACM Transactions on Database Systems (TODS)
Sampling-based estimators for subset-based queries

The VLDB Journal — The International Journal on Very Large Data Bases
Turbo-charging estimate convergence in DBO

Proceedings of the VLDB Endowment
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

The largest databases in use today are so large that answering a query exactly can take minutes, hours, or even days. One way to address this problem is to make use of approximation algorithms. Previous work on online aggregation has considered how to give online estimates with ever-increasing accuracy for aggregate functions over relational join and selection queries. However, no existing work is applicable to online estimation over subset-based SQL queries-those queries with a correlated subquery linked to an outer query via a NOT EXISTS, NOT IN, EXISTS, or IN clause (other queries such as EXCEPT and INTERSECT can also be seen as subset-based queries). In this paper we develop algorithms for online estimation over such queries, and consider the difficult problem of providing probabilistic accuracy guarantees at all times during query execution.