Multidimensional cyclic graph approach: Representing a data cube without common sub-graphs

Authors:
Joubert de Castro Lima;Celso Massaki Hirata
Affiliations:
Federal University of Ouro Preto (UFOP), Morro do Cruzeiro 35, 400-000 Ouro Preto, Minas Gerais, Brazil;Instituto Tecnológico de Aeronáutica (ITA), Praça Marechal Eduardo Gomes, Vila das Acácias 12, 228-900 São José dos Campos, São Paulo, Brazil
Venue:
Information Sciences: an International Journal
Year:
2011

Citing 32
Cited 0

Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient computation of Iceberg cubes with complex measures

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Discovery-Driven Exploration of OLAP Data Cubes

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Condensed Cube: An Efficient Approach to Reducing Data Cube Size

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
MM-Cubing: Computing Iceberg Cubes by Factorizing the Lattice Space

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Mining Constrained Gradients in Large Databases

IEEE Transactions on Knowledge and Data Engineering
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams

Distributed and Parallel Databases
C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Regression Cubes with Lossless Compression and Aggregation

IEEE Transactions on Knowledge and Data Engineering
Database in Depth: Relational Theory for Practitioners

Database in Depth: Relational Theory for Practitioners
Efficient approaches for materialized views selection in a data warehouse

Information Sciences: an International Journal
Data warehouse enhancement: A semantic cube model approach

Information Sciences: an International Journal
Progressive ranking of range aggregates

Data & Knowledge Engineering
Quotient cube: how to summarize the semantics of a data cube

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Star-cubing: computing iceberg cubes by top-down and bottom-up integration

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
High-dimensional OLAP: a minimal cubing approach

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
OLAP over imprecise data with domain constraints

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
ARCube: supporting ranking aggregate queries in partially materialized data cubes

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sampling cube: a framework for statistical olap over sampling data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Supporting OLAP operations over imperfectly integrated taxonomies

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Computing data cubes without redundant aggregated nodes and single graph paths: the sequential MCG approach

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
Graph OLAP: Towards Online Analytical Processing on Graphs

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Computing data cubes using exact sub-graph matching: the sequential MCG approach

Proceedings of the 2009 ACM symposium on Applied Computing
Emerging Cubes: Borders, size estimations and lossless reductions

Information Systems
P-Cube: Answering Preference Queries in Multi-Dimensional Space

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

Quantified Score

Hi-index	0.07

Visualization

Abstract

We present a new full cube computation technique and a cube storage representation approach, called the multidimensional cyclic graph (MCG) approach. The data cube relational operator has exponential complexity and therefore its materialization involves both a huge amount of memory and a substantial amount of time. Reducing the size of data cubes, without a loss of generality, thus becomes a fundamental problem. Previous approaches, such as Dwarf, Star and MDAG, have substantially reduced the cube size using graph representations. In general, they eliminate prefix redundancy and some suffix redundancy from a data cube. The MCG differs significantly from previous approaches as it completely eliminates prefix and suffix redundancies from a data cube. A data cube can be viewed as a set of sub-graphs. In general, redundant sub-graphs are quite common in a data cube, but eliminating them is a hard problem. Dwarf, Star and MDAG approaches only eliminate some specific common sub-graphs. The MCG approach efficiently eliminates all common sub-graphs from the entire cube, based on an exact sub-graph matching solution. We propose a matching function to guarantee one-to-one mapping between sub-graphs. The function is computed incrementally, in a top-down fashion, and its computation uses a minimal amount of information to generate unique results. In addition, it is computed for any measurement type: distributive, algebraic or holistic. MCG performance analysis demonstrates that MCG is 20-40% faster than Dwarf, Star and MDAG approaches when computing sparse data cubes. Dense data cubes have a small number of aggregations, so there is not enough room for runtime and memory consumption optimization, therefore the MCG approach is not useful in computing such dense cubes. The compact representation of sparse data cubes enables the MCG approach to reduce memory consumption by 70-90% when compared to the original Star approach, proposed in [33]. In the same scenarios, the improved Star approach, proposed in [34], reduces memory consumption by only 10-30%, Dwarf by 30-50% and MDAG by 40-60%, when compared to the original Star approach. The MCG is the first approach that uses an exact sub-graph matching function to reduce cube size, avoiding unnecessary aggregation, i.e. improving cube computation runtime.