Scalable approximate query processing with the DBO engine

Authors:
Christopher Jermaine;Subramanian Arumugam;Abhijit Pol;Alin Dobra
Affiliations:
University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL;University of Florida, Gainesville, FL
Venue:
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Year:
2007

Citing 15
Cited 20

Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
Random sampling from B+ trees

VLDB '89 Proceedings of the 15th international conference on Very large data bases
Selectivity and cost estimation for joins based on random sampling

Journal of Computer and System Sciences
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A scalable hash ripple join algorithm

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Interactive Data Analysis: The Control Project

Computer
Processing Real-Time, Non-Aggregate Queries with Time-Constraints in CASE-DB

Proceedings of the Eighth International Conference on Data Engineering
Large-Sample and Deterministic Confidence Intervals for Online Aggregation

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
On producing join results early

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A disk-based join with probabilistic guarantees

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Online estimation for subset-based SQL queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Progressive merge join: a generic and non-blocking sort-based join algorithm

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
The DBO database system

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Approximating predicates and expressive queries on probabilistic databases

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Scalable approximate query processing with the DBO engine

ACM Transactions on Database Systems (TODS)
LCS-Hist: taming massive high-dimensional data cube compression

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Statistical structures for Internet-scale data management

The VLDB Journal — The International Journal on Very Large Data Bases
Turbo-charging estimate convergence in DBO

Proceedings of the VLDB Endowment
An experimental study of time-constrained aggregate queries

Proceedings of the 13th International Conference on Extending Database Technology
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Effective and efficient sampling methods for deep web aggregation queries

Proceedings of the 14th International Conference on Extending Database Technology
Effective stratification for low selectivity queries on deep web data sources

Proceedings of the 20th ACM international conference on Information and knowledge management
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches

Foundations and Trends in Databases
Approximate answers to OLAP queries on streaming data warehouses

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Histograms as statistical estimators for aggregate queries

Information Systems
Taming massive distributed datasets: data sampling using bitmap indices

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Stat!: an interactive analytics environment for big data

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Processing online aggregation on skewed data in mapreduce

Proceedings of the fifth international workshop on Cloud data management
Sampling estimators for parallel online aggregation

BNCOD'13 Proceedings of the 29th British National conference on Big Data
Scalable progressive analytics on big data in the cloud

Proceedings of the VLDB Endowment
Optimizing Sample Design for Approximate Query Processing

International Journal of Knowledge-Based Organizations

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes query processing in the DBO database system. Like other database systems designed for ad-hoc, analytic processing, DBO is able to compute the exact answer to queries over a large relational database in a scalable fashion. Unlike any other system designed for analytic processing, DBO can constantly maintain a guess as to the final answer to an aggregate query throughout execution, along with statistically meaningful bounds for the guess's accuracy. As DBO gathers more and more information, the guess gets more and more accurate, until it is 100% accurate as the query is completed. This allows users to stop the execution at any time that they are happy with the query accuracy, and encourages exploratory data analysis.