Sampling-Based Selectivity Estimation for Joins Using Augmented Frequent Value Statistics

Authors:
Peter J. Haas;Arun N. Swami
Affiliations:
-;-
Venue:
ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Year:
1995

Citing 0
Cited 17

Balancing histogram optimality and practicality for query result size estimation

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Estimating alphanumeric selectivity in the presence of wildcards

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
Iterated DFT based techniques for join size estimation

Proceedings of the seventh international conference on Information and knowledge management
Algorithms and Support for Horizontal Class Partitioning in Object-Oriented Databases

Distributed and Parallel Databases
Using wavelet decomposition to support progressive and approximate range-sum queries over data cubes

Proceedings of the ninth international conference on Information and knowledge management
A Hybrid Estimator for Selectivity Estimation

IEEE Transactions on Knowledge and Data Engineering
Query Size Estimation for Joins Using Systematic Sampling

Distributed and Parallel Databases
Consistently estimating the selectivity of conjuncts of predicates

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Consistent selectivity estimation via maximum entropy

The VLDB Journal — The International Journal on Very Large Data Bases
Optimal splitters for database partitioning with size bounds

Proceedings of the 12th International Conference on Database Theory
IRSJ: incremental refining spatial joins for interactive queries in GIS

Geoinformatica
The VC-dimension of SQL queries and selectivity estimation through sampling

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Spatio-temporal histograms

SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Statistics collection in oracle spatial and graph: fast histogram construction for complex geometry objects

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We compare empirically the cost of estimating the selectivity of a star join using the sampling-based t-cross procedure to the cost of computing the join and obtaining the exact answer. The relative cost of sampling can be excessive when a join attribute value exhibits "heterogeneous skew." To alleviate this problem, we propose Algorithm TCM, a modified version of t-cross that incorporates "augmented frequent value" (AFV) statistics. We provide a sampling-based method for estimating AFV statistics that does not require indexes on attribute values, requires only one pass though each relation, and uses an amount of memory much smaller than the size of a relation. Our experiments show that the use of estimated AFV statistics can reduce the relative cost of sampling by orders of magnitude. We also show that use of estimated AFV statistics can reduce the relative error of the classical System R selectivity formula.