Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Estimating alphanumeric selectivity in the presence of wildcards
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Wavelet-based histograms for selectivity estimation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets
Proceedings of the seventh international conference on Information and knowledge management
Iterated DFT based techniques for join size estimation
Proceedings of the seventh international conference on Information and knowledge management
Algorithms and Support for Horizontal Class Partitioning in Object-Oriented Databases
Distributed and Parallel Databases
Using wavelet decomposition to support progressive and approximate range-sum queries over data cubes
Proceedings of the ninth international conference on Information and knowledge management
A Hybrid Estimator for Selectivity Estimation
IEEE Transactions on Knowledge and Data Engineering
Query Size Estimation for Joins Using Systematic Sampling
Distributed and Parallel Databases
Consistently estimating the selectivity of conjuncts of predicates
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Consistent selectivity estimation via maximum entropy
The VLDB Journal — The International Journal on Very Large Data Bases
Optimal splitters for database partitioning with size bounds
Proceedings of the 12th International Conference on Database Theory
The VC-dimension of SQL queries and selectivity estimation through sampling
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We compare empirically the cost of estimating the selectivity of a star join using the sampling-based t-cross procedure to the cost of computing the join and obtaining the exact answer. The relative cost of sampling can be excessive when a join attribute value exhibits "heterogeneous skew." To alleviate this problem, we propose Algorithm TCM, a modified version of t-cross that incorporates "augmented frequent value" (AFV) statistics. We provide a sampling-based method for estimating AFV statistics that does not require indexes on attribute values, requires only one pass though each relation, and uses an amount of memory much smaller than the size of a relation. Our experiments show that the use of estimated AFV statistics can reduce the relative cost of sampling by orders of magnitude. We also show that use of estimated AFV statistics can reduce the relative error of the classical System R selectivity formula.