Approximate computation of multidimensional aggregates of sparse data using wavelets

  • Authors:
  • Jeffrey Scott Vitter;Min Wang

  • Affiliations:
  • Center for Geometric Computing and Department of Computer Science, Duke University, Durham, NC;Center for Geometric Computing and Department of Computer Science, Duke University, Durham, NC

  • Venue:
  • SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Computing multidimensional aggregates in high dimensions is a performance bottleneck for many OLAP applications. Obtaining the exact answer to an aggregation query can be prohibitively expensive in terms of time and/or storage space in a data warehouse environment. It is advantageous to have fast, approximate answers to OLAP aggregation queries.In this paper, we present a novel method that provides approximate answers to high-dimensional OLAP aggregation queries in massive sparse data sets in a time-efficient and space-efficient manner. We construct a compact data cube, which is an approximate and space-efficient representation of the underlying multidimensional array, based upon a multiresolution wavelet decomposition. In the on-line phase, each aggregation query can generally be answered using the compact data cube in one I/O or a smalll number of I/Os, depending upon the desired accuracy.We present two I/O-efficient algorithms to construct the compact data cube for the important case of sparse high-dimensional arrays, which often arise in practice. The traditional histogram methods are infeasible for the massive high-dimensional data sets in OLAP applications. Previously developed wavelet techniques are efficient only for dense data. Our on-line query processing algorithm is very fast and capable of refining answers as the user demands more accuracy. Experiments on real data show that our method provides significantly more accurate results for typical OLAP aggregation queries than other efficient approximation techniques such as random sampling.