Range CUBE: Efficient Cube Computation by Exploiting Data Correlation

  • Authors:
  • Ying Feng;Divyakant Agrawal;Amr El Abbadi;Ahmed Metwally

  • Affiliations:
  • -;-;-;-

  • Venue:
  • ICDE '04 Proceedings of the 20th International Conference on Data Engineering
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data cube computation and representation are prohibitivelyexpensive in terms of time and space. Prior workhas focused on either reducing the computation time or condensingthe representation of a data cube. In this paper,we introduce Range Cubing as an efficient way to computeand compress the data cube without any loss of precision.A new data structure, range trie, is used to compress andidentify correlation in attribute values, and compress theinput dataset to effectively reduce the computational cost.The range cubing algorithm generates a compressed cube,called range cube, which partitions all cells into disjointranges. Each range represents a subset of cells with thesame aggregation value, as a tuple which has the same numberof dimensions as the input data tuples. The range cubepreserves the roll-up/drill-down semantics of a data cube.Compared to H-Cubing, experiments on real dataset showa running time of less than one thirtieth, still generating arange cube of less than one ninth of the space of the fullcube, when both algorithms run in their preferred dimensionorders. On synthetic data, range cubing demonstratesmuch better scalability, as well as higher adaptiveness toboth data sparsity and skew.