A Pareto model for OLAP view size estimation

Authors:
Thomas P. Nadeau;Toby J. Teorey
Affiliations:
University of Michigan, Ann Arbor, Michigan;University of Michigan, Ann Arbor, Michigan
Venue:
CASCON '01 Proceedings of the 2001 conference of the Centre for Advanced Studies on Collaborative research
Year:
2001

Citing 8
Cited 2

Probabilistic counting algorithms for data base applications

Journal of Computer and System Sciences
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Random sampling for histogram construction: how much is enough?

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Towards estimation error guarantees for distinct values

PODS '00 Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Analysis and performance of inverted data base structures

Communications of the ACM
Modeling Skewed Distribution Using Multifractals and the `80-20' Law

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Storage estimation for multidimensional aggregates in OLAP

CASCON '99 Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research

Achieving scalability in OLAP materialized view selection

Proceedings of the 5th ACM international workshop on Data Warehousing and OLAP
Data mining-based materialized view and index selection in data warehouses

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

On Line Analytical Processing (OLAP) aims at gaining useful information quickly from large amounts of data residing in a data warehouse. To improve the quickness of response to queries, pre-aggregation is a useful strategy. However, it is usually impossible to pre-aggregate along all combinations of the dimensions. The multi-dimensional aspects of the data lead to combinatorial explosion in the number and potential storage size of the aggregates. We must selectively pre-aggregate. Cost/benefit analysis involves estimating the storage requirements of the aggregates in question. We present an original algorithm for estimating the number of rows in an aggregate based on the Pareto distribution model. We test the Pareto Model Algorithm empirically against three published algorithms, and conclude the Pareto Model Algorithm is consistently the best of these algorithms for estimating view size.