Selectivity and cost estimation for joins based on random sampling
Journal of Computer and System Sciences
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The Case for Online Aggregation
The Case for Online Aggregation
Join synopses for approximate query answering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Ripple joins for online aggregation
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A scalable hash ripple join algorithm
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Informix under CONTROL: Online Query Processing
Data Mining and Knowledge Discovery
Approximate Query Answering Using Data Warehouse Striping
Journal of Intelligent Information Systems - Special issue on data warehousing and knowledge discovery
High-dimensional nearest neighbor search with remote data centers
Knowledge and Information Systems
Online Feedback for Nested Aggregate Queries with Multi-Threading
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Online Dynamic Reordering for Interactive Data Processing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Approximate Query Answering Using Data Warehouse Striping
DaWaK '01 Proceedings of the Third International Conference on Data Warehousing and Knowledge Discovery
Time-Interval Sampling for Improved Estimations in Data Warehouses
DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
The VLDB Journal — The International Journal on Very Large Data Bases
Progressive evaluation of nested aggregate queries
The VLDB Journal — The International Journal on Very Large Data Bases
Approximate query processing using wavelets
The VLDB Journal — The International Journal on Very Large Data Bases
DSQoS-distributed architecture providing QoS in summary warehouses
DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
A disk-based join with probabilistic guarantees
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
When can we trust progress estimators for SQL queries?
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Relational confidence bounds are easy with the bootstrap
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Integrated resource management for data stream systems
Proceedings of the 2005 ACM symposium on Applied computing
Online estimation for subset-based SQL queries
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Derby/S: a DBMS for sample-based query answering
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
ACM Transactions on Database Systems (TODS)
Cardinality estimation using sample views with quality assurance
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Scalable approximate query processing with the DBO engine
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Incorporating quality aspects in sensor data streams
Proceedings of the ACM first Ph.D. workshop in CIKM
Supporting time-constrained SQL queries in oracle
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An interactive framework for raster data spatial joins
Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems
DAWN: an efficient framework of DCT for data with error estimation
The VLDB Journal — The International Journal on Very Large Data Bases
Scalable approximate query processing with the DBO engine
ACM Transactions on Database Systems (TODS)
The design of a query monitoring system
ACM Transactions on Database Systems (TODS)
Representing Data Quality in Sensor Data Streaming Environments
Journal of Data and Information Quality (JDIQ)
Turbo-charging estimate convergence in DBO
Proceedings of the VLDB Endowment
An experimental study of time-constrained aggregate queries
Proceedings of the 13th International Conference on Extending Database Technology
Continuous sampling for online aggregation over multiple queries
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Information Sciences: an International Journal
An incremental refining spatial join algorithm for estimating query results in GIS
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Improving online aggregation performance for skewed data distribution
DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches
Foundations and Trends in Databases
You can stop early with COLA: online processing of aggregate queries in the cloud
Proceedings of the 21st ACM international conference on Information and knowledge management
Processing online aggregation on skewed data in mapreduce
Proceedings of the fifth international workshop on Cloud data management
A sampling algebra for aggregate estimation
Proceedings of the VLDB Endowment
Hi-index | 0.01 |
The online aggregation system recently proposed by Hellerstein, et al. permits interactive exploration of large, complex datasets stored in relational database management systems. Running confidence intervals are an important component of an online aggregation system and indicate to the user the estimated proximity of each running aggregate to the corresponding final result. Large-sample confidence intervals contain the final result with a prespecified probability and rest on central limit theorems, while deterministic confidence intervals contain the final query result with probability 1. In this paper we show how new and existing central limit theorems, simple bounding arguments, and the delta method can be used to derive formulas for both large-sample and deterministic confidence intervals. To illustrate these techniques, we obtain formulas for running confidence intervals in the case of single-table and multi-table AVG, COUNT, SUM, VARIANCE, and STDEV queries with join and selection predicates. Duplicate-elimination and GROUP-BY operations are also considered. We then provide numerically stable algorithms for computing the confidence intervals and analyze the complexity of these algorithms.