Storage estimation for multidimensional aggregates in OLAP

  • Authors:
  • Kanda Runapongsa;Thomas P. Nadeau;Toby J. Teorey

  • Affiliations:
  • Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, MI;Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, MI;Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, MI

  • Venue:
  • CASCON '99 Proceedings of the 1999 conference of the Centre for Advanced Studies on Collaborative research
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

On-line analytical processing (OLAP) is an important technique for analyzing data in decision support systems. Most analytical queries require aggregation of the interesting data. Pre-aggregation is one of the most important techniques used to speed up the query response time. However, precomputing every aggregate takes a large amount of time and space. The decision of which aggregates should be precomputed and how much space is required is thus important. By estimating the storage space required for each aggregate view, we can allocate the space for aggregates efficienlty and decide which aggregates to precompute. We investigate four existing strategies for this problem: two based on mathematical approximations, one based on sampling, and one hybrid approach based on mathematical approximation and sampling. We propose a new hybrid strategy that is based on mathematical approximation and sampling and is easy to compute. We evaluate the accuracy of these algorithms in estimating the storage explosion due to aggregation for different data distributions and data densities. The result indicate that our proposed strategy approximates the explosion more accurately then other strategies.