Pruning attribute values from data cubes with diamond dicing

Authors:
Hazel Webb;Owen Kaser;Daniel Lemire
Affiliations:
University of New Brunswick;University of New Brunswick;Université du Québec à Montréal
Venue:
IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Year:
2008

Citing 20
Cited 1

On saying “Enough already!” in SQL

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Trawling the Web for emerging cyber-communities

WWW '99 Proceedings of the eighth international conference on World Wide Web
Analysis of pre-computed partition top method for range top-k queries in OLAP data cubes

Proceedings of the eleventh international conference on Information and knowledge management
A Pareto Model for OLAP View Size Estimation

Information Systems Frontiers
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
The Skyline Operator

Proceedings of the 17th International Conference on Data Engineering
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
ICICLES: Self-Tuning Samples for Approximate Query Answering

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Adaptive Method for Range Top- k Queries in OLAP Data Cubes

DEXA '02 Proceedings of the 13th International Conference on Database and Expert Systems Applications
Dynamic sample selection for approximate query processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An Approach to Relate the Web Communities through Bipartite Graphs

WISE '01 Proceedings of the Second International Conference on Web Information Systems Engineering (WISE'01) Volume 1 - Volume 1
Diamond in the rough: finding Hierarchical Heavy Hitters in multi-dimensional data

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
What's hot and what's not: tracking most frequent items dynamically

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
The cgmCUBE project: Optimizing parallel data cube generation for ROLAP

Distributed and Parallel Databases
DADA: a data cube for dominant relationship analysis

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Efficient multidimensional data representations based on multiple correspondence analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Star-cubing: computing iceberg cubes by top-down and bottom-up integration

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient skyline computation over low-cardinality domains

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Efficient processing of top-k dominating queries on multi-dimensional data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Hierarchical bin buffering: Online local moments for dynamic external memory arrays

ACM Transactions on Algorithms (TALG)

Reordering columns for smaller indexes

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data stored in a data warehouse are inherently multidimensional, unlike most data-pruning techniques (such as iceberg and top-k queries). However, analysts need to issue multidimensional queries. For example, an analyst may need to select not just the most profitable stores or---separately---the most profitable products, but simultaneous sets of stores and products fulfilling some profitability constraints. To fill this need, we propose a new operator, the diamond dice. Because of the interaction between dimensions, the computation of diamonds is challenging. We present the first diamond-dicing experiments on large data sets. Our external memory algorithm avoids potentially expensive random accesses. Experiments show that we can compute diamond cubes over fact tables containing 100 million facts and 500,000 distinct attribute values in less than an hour using a single-core PC.