Quotient cube: how to summarize the semantics of a data cube

  • Authors:
  • Laks V. S. Lakshmanan;Jian Pei;Jiawei Han

  • Affiliations:
  • U. of British Columbia;Simon Fraser U.;U. of Illinois

  • Venue:
  • VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Partitioning a data cube into sets of cells with "similar behavior" often better exposes the semantics in the cube. E.g., if we find that average boots sales in the West 10th store of Walmart was the same for winter as for the whole year, it signifies something interesting about the trend of boots sales in that location in that year. In this paper, we are interested in finding succinct summaries of the data cube, exploiting regularities present in the cube, with a clear basis. We would like the summary: (i) to be as concise as possible, (ii) to itself form a lattice preserving the rollup/drilldown semantics of the cube, and (iii) to allow the original cube to be fully recovered. We illustrate the utility of solving this problem and discuss the inherent challenges. We develop techniques for partitioning cube cells for obtaining succinct summaries, and introduce the quotient cube. We give efficient algorithms for computing it from a base table. For monotone aggregate functions (e.g., COUNT, MIN, MAX, SUM on non-negative measures, etc.), our solution is optimal (i.e., quotient cube of the least size). For nonmonotone functions (e.g., AVG), we obtain a locally optimal solution. We experimentally demonstrate the efficacy of our ideas and techniques and the scalability of our algorithms.