A Probabilistic Approach for Computing Approximate Iceberg Cubes

  • Authors:
  • Alfredo Cuzzocrea;Filippo Furfaro;Giuseppe M. Mazzeo

  • Affiliations:
  • ICAR-CNR, I-87036, Cosenza, Italy, and University of Calabria, Cosenza, Italy I-87036;ICAR-CNR, I-87036, Cosenza, Italy, and University of Calabria, Cosenza, Italy I-87036;ICAR-CNR, I-87036, Cosenza, Italy, and University of Calabria, Cosenza, Italy I-87036

  • Venue:
  • DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

An iceberg cubeis a refinement of a data cubecontaining the subset of cells whose measure is larger than a given threshold (iceberg condition). Iceberg cubes are well-established tools supporting fast data analysis, as they filter the information contained in classical data cubes to provide the most relevant pieces of information. Although the problem of efficiently computing iceberg cubes has been widely investigated, this task is intrinsically expensive, due to the large amount of data which must be usually dealt with. Indeed, in several application scenarios, efficiency is so crucial that users would benefit from a fast computation of even incomplete iceberg cubes. In fact, an incomplete iceberg cube could support preliminary data analysis by allowing users to focus their explorations quickly and effectively, thus saving large amounts of computational resources. In this paper, we propose a technique for efficiently computing iceberg cubes, possibly trading off the computational efficiency with the completeness of the result. Specifically, we devise an algorithm which employs a probabilistic framework to prevent cells which are probably irrelevant (i.e., which are unlikely to satisfy the iceberg condition) from being computed. The output of our algorithm is an incomplete iceberg cube, which is efficiently computed and prone to be refined, in the sense that the user can decide to go through the computation of the cells which were estimated irrelevant during the previous invocations of the algorithm.