On estimating the size of projections

Authors:
Jeffrey F. Naughton;S. Seshadri
Affiliations:
-;-
Venue:
ICDT '90 Proceedings of the third international conference on database theory on Database theory
Year:
1990

Citing 0
Cited 9

The power of sampling in knowledge discovery

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
On random sampling over joins

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Towards estimation error guarantees for distinct values

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Distinct Sampling for Highly-Accurate Answers to Distinct Values Queries and Event Reports

Proceedings of the 27th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Query sampling in DB2 Universal Database

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Towards estimating the number of distinct value combinations for a set of attributes

Proceedings of the 14th ACM international conference on Information and knowledge management
Estimating the output cardinality of partial preaggregation with a measure of clusteredness

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a new sampling algorithm for estimating the number of tuples in the projection of a relation. The algorithm requires no assumptions about the distributions of values in the attributes of the relation and converges faster and smoother than previous sampling algorithms for the problem. We give both a sound theoretical basis for the algorithm and experimental data from an implementation of the algorithm.