A Probabilistic Approach for Computing Approximate Iceberg Cubes

Authors:
Alfredo Cuzzocrea;Filippo Furfaro;Giuseppe M. Mazzeo
Affiliations:
ICAR-CNR, I-87036, Cosenza, Italy, and University of Calabria, Cosenza, Italy I-87036;ICAR-CNR, I-87036, Cosenza, Italy, and University of Calabria, Cosenza, Italy I-87036;ICAR-CNR, I-87036, Cosenza, Italy, and University of Calabria, Cosenza, Italy I-87036
Venue:
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Year:
2008

Citing 16
Cited 0

Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
Join synopses for approximate query answering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient computation of Iceberg cubes with complex measures

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Selectivity Estimation Without the Attribute Value Independence Assumption

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
PnP: Parallel and External Memory Iceberg Cube Computation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach

IEEE Transactions on Knowledge and Data Engineering
Efficient Computation of Iceberg Cubes by Bounding Aggregate Functions

IEEE Transactions on Knowledge and Data Engineering
Ix-cubes: iceberg cubes for data warehousing and olap on xml data

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Computing iceberg quotient cubes with bounding

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Multiway pruning for efficient iceberg cubing

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

An iceberg cubeis a refinement of a data cubecontaining the subset of cells whose measure is larger than a given threshold (iceberg condition). Iceberg cubes are well-established tools supporting fast data analysis, as they filter the information contained in classical data cubes to provide the most relevant pieces of information. Although the problem of efficiently computing iceberg cubes has been widely investigated, this task is intrinsically expensive, due to the large amount of data which must be usually dealt with. Indeed, in several application scenarios, efficiency is so crucial that users would benefit from a fast computation of even incomplete iceberg cubes. In fact, an incomplete iceberg cube could support preliminary data analysis by allowing users to focus their explorations quickly and effectively, thus saving large amounts of computational resources. In this paper, we propose a technique for efficiently computing iceberg cubes, possibly trading off the computational efficiency with the completeness of the result. Specifically, we devise an algorithm which employs a probabilistic framework to prevent cells which are probably irrelevant (i.e., which are unlikely to satisfy the iceberg condition) from being computed. The output of our algorithm is an incomplete iceberg cube, which is efficiently computed and prone to be refined, in the sense that the user can decide to go through the computation of the cells which were estimated irrelevant during the previous invocations of the algorithm.