The polynomial complexity of fully materialized coalesced cubes

Authors:
Yannis Sismanis;Nick Roussopoulos
Affiliations:
Dept. of Computer Science, University of Maryland;Dept. of Computer Science, University of Maryland
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 17
Cited 12

Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Cubetree: organization of and bulk incremental updates on the data cube

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
On the complexity of the view-selection problem

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
CubiST: a new algorithm for improving the performance of ad-hoc OLAP queries

Proceedings of the 3rd ACM international workshop on Data warehousing and OLAP
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Index Selection for OLAP

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Data Warehouse Configuration

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
QC-trees: an efficient summary structure for semantic OLAP

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Hierarchical dwarfs for the rollup cube

DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP

CURE for cubes: cubing using a ROLAP engine

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
New Algorithm for Computing Cube on Very Large Compressed Data Sets

IEEE Transactions on Knowledge and Data Engineering
Research in data warehouse modeling and design: dead or alive?

DOLAP '06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach

IEEE Transactions on Knowledge and Data Engineering
ROLAP implementations of the data cube

ACM Computing Surveys (CSUR)
Sampling cube: a framework for statistical olap over sampling data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Supporting the data cube lifecycle: the power of ROLAP

The VLDB Journal — The International Journal on Very Large Data Bases
Dwarfs in the rearview mirror: how big are they really?

Proceedings of the VLDB Endowment
An efficient method for maintaining data cubes incrementally

Information Sciences: an International Journal
Revisiting the cube lifecycle in the presence of hierarchies

The VLDB Journal — The International Journal on Very Large Data Bases
Toward automated large-scale information integration and discovery

Data Management in a Connected World

Quantified Score

Hi-index	0.00

Visualization

Abstract

The data cube operator encapsulates all possible groupings of a data set and has proved to be an invaluable tool in analyzing vast amounts of data. However its apparent exponential complexity has significantly limited its applicability to low dimensional datasets. Recently the idea of the coalesced cube was introduced, and showed that high-dimensional coalesced cubes are orders of magnitudes smaller in size than the original data cubes even when they calculate and store every possible aggregation with 100% precision. In this paper we present an analytical framework for estimating the size of coalesced cubes. By using this framework on uniform coalesced cubes we show that their size and the required computation time scales polynomially with the dimensionality of the data set and, therefore, a full data cube at 100% precision is not inherently cursed by high dimensionality. Additionally, we show that such coalesced cubes scale polynomially (and close to linearly) with the number of tuples on the dataset. We were also able to develop an efficient algorithm for estimating the size of coalesced cubes before actually computing them, based only on metadata about the cubes. Finally, we complement our analytical approach with an extensive experimental evaluation using real and synthetic data sets, and demonstrate that not only uniform but also zipfian and real coalesced cubes scale polynomially.