A Pareto Model for OLAP View Size Estimation

Authors:
Thomas P. Nadeau;Toby J. Teorey
Affiliations:
Computer Science and Engineering Division (CSE), Department of Electrical Engineering and Computer Science (EECS), The University of Michigan, 1301 Beal Avenue, Ann Arbor, MI 48109-2122, USA.
Venue:
Information Systems Frontiers
Year:
2003

Citing 0
Cited 3

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more

Physical Database Design: the database professional's guide to exploiting indexes, views, storage, and more
Pruning attribute values from data cubes with diamond dicing

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
On power-law distributed balls in bins and its applications to view size estimation

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

On-Line Analytical Processing (OLAP) aims at gaining useful information quickly from large amounts of data residing in a data warehouse. To improve the quickness of response to queries, pre-aggregation is a useful strategy. However, it is usually impossible to pre-aggregate along all combinations of the dimensions. The multi-dimensional aspects of the data lead to combinatorial explosion in the number and potential storage size of the aggregates. We must selectively pre-aggregate. Cost/benefit analysis involves estimating the storage requirements of the aggregates in question. We present an original algorithm for estimating the number of rows in an aggregate based on the Pareto distribution model. We test the Pareto Model Algorithm empirically against four published algorithms, and conclude the Pareto Model Algorithm is consistently the best of these algorithms for estimating view size.