Practical selectivity estimation through adaptive sampling
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Randomized algorithms for optimizing large join queries
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Optimal histograms for limiting worst-case error propagation in the size of join results
ACM Transactions on Database Systems (TODS)
Balancing histogram optimality and practicality for query result size estimation
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Improved histograms for selectivity estimation of range predicates
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Accurate estimation of the number of tuples satisfying a condition
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Optimal Histograms with Quality Guarantees
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Improving Range Query Estimation on Histograms
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Consistently estimating the selectivity of conjuncts of predicates
VLDB '05 Proceedings of the 31st international conference on Very large data bases
ISOMER: Consistent Histogram Construction Using Query Feedback
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Approximation and streaming algorithms for histogram construction problems
ACM Transactions on Database Systems (TODS)
Consistent selectivity estimation via maximum entropy
The VLDB Journal — The International Journal on Very Large Data Bases
Fast and effective histogram construction
Proceedings of the 18th ACM conference on Information and knowledge management
Information theory for data management
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Understanding cardinality estimation using entropy maximization
ACM Transactions on Database Systems (TODS)
Brightness preserving histogram equalization with maximum entropy: a variational perspective
IEEE Transactions on Consumer Electronics
Hi-index | 0.00 |
Histograms have been extensively used for selectivity estimation by academics and have successfully been adopted by database industry. However, the estimation error is usually large for skewed distributions and biased attributes, which are typical in real-world data. Therefore, we propose effective models to quantitatively measure bias and selectivity based on information entropy. These models together with the principles of maximum entropy are then used to develop a class of entropy-based histograms. Moreover, since entropy can be computed incrementally, we present the incremental variations of our algorithms that reduce the complexities of the histogram construction from quadratic to linear. We conducted an extensive set of experiments with both synthetic and real-world datasets to compare the accuracy and efficiency of our proposed techniques with many other histogram-based techniques, showing the superiority of the entropy-based approaches for both equality and range queries.