PnP: sequential, external memory, and parallel iceberg cube computation

  • Authors:
  • Ying Chen;Frank Dehne;Todd Eavis;Andrew Rau-Chaplin

  • Affiliations:
  • Microsoft Corp., Redmond, USA;School of Computer Science, Carleton University, Ottawa, Canada;Department of Computer Science, Concordia University, Montreal, Canada;Faculty of Computer Science, Dalhousie University, Halifax, Canada

  • Venue:
  • Distributed and Parallel Databases
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present "Pipe 'n Prune" (PnP), a new hybrid method for iceberg-cube query computation. The novelty of our method is that it achieves a tight integration of top-down piping for data aggregation with bottom-up a priori data pruning. A particular strength of PnP is that it is efficient for all of the following scenarios: (1) Sequential iceberg-cube queries, (2) External memory iceberg-cube queries, and (3) Parallel iceberg-cube queries on shared-nothing PC clusters with multiple disks.We performed an extensive performance analysis of PnP for the above scenarios with the following main results: In the first scenario PnP performs very well for both dense and sparse data sets, providing an interesting alternative to BUC and Star-Cubing. In the second scenario PnP shows a surprisingly efficient handling of disk I/O, with an external memory running time that is less than twice the running time for full in-memory computation of the same iceberg-cube query. In the third scenario PnP scales very well, providing near linear speedup for a larger number of processors and thereby solving the scalability problem observed for the parallel iceberg-cubes proposed by Ng et al.