Relational confidence bounds are easy with the bootstrap

Authors:
Abhijit Pol;Christopher Jermaine
Affiliations:
University of Florida, Gainesville, FL;University of Florida, Gainesville, FL
Venue:
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Year:
2005

Citing 13
Cited 1

Practical selectivity estimation through adaptive sampling

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Statistical estimators for aggregate relational algebra queries

ACM Transactions on Database Systems (TODS)
Fixed-precision estimation of join selectivity

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Selectivity and cost estimation for joins based on random sampling

Journal of Computer and System Sciences
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms

The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Ripple joins for online aggregation

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
The Aqua approximate query answering system

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Towards estimation error guarantees for distinct values

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Learning Probabilistic Relational Models

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Large-Sample and Deterministic Confidence Intervals for Online Aggregation

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Uncertainty Management for Spatial Data in Databases: Fuzzy Spatial Data Types

SSD '99 Proceedings of the 6th International Symposium on Advances in Spatial Databases

Early accurate results for advanced analytics on MapReduce

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Statistical estimation and approximate query processing have become increasingly prevalent applications for database systems. However, approximation is usually of little use without some sort of guarantee on estimation accuracy, or "confidence bound." Analytically deriving probabilistic guarantees for database queries over sampled data is a daunting task, not suitable for the faint of heart, and certainly beyond the expertise of the typical database system end-user. This paper considers the problem of incorporating into a database system a powerful "plug-in" method for computing confidence bounds on the answer to relational database queries over sampled or incomplete data. This statistical tool, called the bootstrap, is simple enough that it can be used by a data-base programmer with a rudimentary mathematical background, but general enough that it can be applied to almost any statistical inference problem. Given the power and ease-of-use of the bootstrap, we argue that the algorithms presented for supporting the bootstrap should be incorporated into any database system which is intended to support analytic processing.