Implementing data cubes efficiently
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Cubetree: organization of and bulk incremental updates on the data cube
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets
Proceedings of the seventh international conference on Information and knowledge management
On the complexity of the view-selection problem
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Bottom-up computation of sparse and Iceberg CUBE
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Congressional samples for approximate answering of group-by queries
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
CubiST: a new algorithm for improving the performance of ad-hoc OLAP queries
Proceedings of the 3rd ACM international workshop on Data warehousing and OLAP
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Computation of Sparse Datacubes
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
QC-trees: an efficient summary structure for semantic OLAP
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Hierarchical dwarfs for the rollup cube
DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
CURE for cubes: cubing using a ROLAP engine
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
GORDIAN: efficient and scalable discovery of composite keys
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
New Algorithm for Computing Cube on Very Large Compressed Data Sets
IEEE Transactions on Knowledge and Data Engineering
Research in data warehouse modeling and design: dead or alive?
DOLAP '06 Proceedings of the 9th ACM international workshop on Data warehousing and OLAP
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach
IEEE Transactions on Knowledge and Data Engineering
ROLAP implementations of the data cube
ACM Computing Surveys (CSUR)
Sampling cube: a framework for statistical olap over sampling data
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Supporting the data cube lifecycle: the power of ROLAP
The VLDB Journal — The International Journal on Very Large Data Bases
Dwarfs in the rearview mirror: how big are they really?
Proceedings of the VLDB Endowment
An efficient method for maintaining data cubes incrementally
Information Sciences: an International Journal
Revisiting the cube lifecycle in the presence of hierarchies
The VLDB Journal — The International Journal on Very Large Data Bases
Toward automated large-scale information integration and discovery
Data Management in a Connected World
Hi-index | 0.00 |
The data cube operator encapsulates all possible groupings of a data set and has proved to be an invaluable tool in analyzing vast amounts of data. However its apparent exponential complexity has significantly limited its applicability to low dimensional datasets. Recently the idea of the coalesced cube was introduced, and showed that high-dimensional coalesced cubes are orders of magnitudes smaller in size than the original data cubes even when they calculate and store every possible aggregation with 100% precision. In this paper we present an analytical framework for estimating the size of coalesced cubes. By using this framework on uniform coalesced cubes we show that their size and the required computation time scales polynomially with the dimensionality of the data set and, therefore, a full data cube at 100% precision is not inherently cursed by high dimensionality. Additionally, we show that such coalesced cubes scale polynomially (and close to linearly) with the number of tuples on the dataset. We were also able to develop an efficient algorithm for estimating the size of coalesced cubes before actually computing them, based only on metadata about the cubes. Finally, we complement our analytical approach with an extensive experimental evaluation using real and synthetic data sets, and demonstrate that not only uniform but also zipfian and real coalesced cubes scale polynomially.