A comparison of five probabilistic view-size estimation techniques in OLAP

Authors:
Kamel Aouiche;Daniel Lemire
Affiliations:
LICEF: Université du Québec à Montréal, Montreal, PQ, Canada;LICEF: Université du Québec à Montréal, Montreal, PQ, Canada
Venue:
Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Year:
2007

Citing 0
Cited 7

Report on the Tenth ACM International Workshop on Data Warehousing and OLAP (DOLAP'07)

ACM SIGMOD Record
A view selection algorithm with performance guarantee

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Sorting improves word-aligned bitmap indexes

Data & Knowledge Engineering
Reordering columns for smaller indexes

Information Sciences: an International Journal
On power-law distributed balls in bins and its applications to view size estimation

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Reordering rows for better compression: Beyond the lexicographic order

ACM Transactions on Database Systems (TODS)
HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm

Proceedings of the 16th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

A data warehouse cannot materialize all possible views, hence we must estimate quickly, accurately, and reliably the size of views to determine the best candidates for materialization. Many available techniques for view-size estimation make particular statistical assumptions and their error can be large. Comparatively, unassuming probabilistic techniques are slower, but they estimate accurately and reliability very large view sizes using little memory. We compare five unassuming hashing-based view-size estimation techniques including Stochastic Probabilistic Counting and LogLog Probabilistic Counting. Our experiments show that only Generalized Counting, Gibbons-Tirthapura, and Adaptive Counting provide universally tight estimates irrespective of the sizeof the view; of those, only Adaptive Counting remains constantly fast as we increasethe memory budget.