An efficient method for maintaining data cubes incrementally

Authors:
Ki Yong Lee;Yon Dohn Chung;Myoung Ho Kim
Affiliations:
Department of Computer Science, KAIST, 373-1 Yuseong-Gu, Guseong-Dong, Daejeon 305-701, South Korea;Department of Computer Science and Engineering, Korea University, 5 Ga, Anam-Dong, Seongbuk-Gu, Seoul 136-713, South Korea;Department of Computer Science, KAIST, 373-1 Yuseong-Gu, Guseong-Dong, Daejeon 305-701, South Korea
Venue:
Information Sciences: an International Journal
Year:
2010

Citing 31
Cited 3

Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Query evaluation techniques for large databases

ACM Computing Surveys (CSUR)
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Maintenance of data cubes and summary tables in a warehouse

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Maintenance of cube automatic summary tables

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Efficient computation of Iceberg cubes with complex measures

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
An Improved Algorithm for the Incremental Recomputation of Active Relational Expressions

IEEE Transactions on Knowledge and Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Including Group-By in Query Optimization

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling-Based Estimation of the Number of Distinct Values of an Attribute

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Summarizability in OLAP and Statistical Data Bases

SSDBM '97 Proceedings of the Ninth International Conference on Scientific and Statistical Database Management
Aggregate view management in data warehouses

Handbook of massive data sets
Multidimensional normal forms for data warehouse design

Information Systems
Maintaining Data Cubes under Dimension Updates

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Condensed Cube: An Efficient Approach to Reducing Data Cube Size

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Range CUBE: Efficient Cube Computation by Exploiting Data Correlation

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
GPIVOT: Efficient Incremental Maintenance of Complex ROLAP Views

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Efficient computation of multiple group by queries

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
C-Cubing: Efficient Computation of Closed Cubes by Aggregation-Based Checking

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Incremental maintenance of aggregate and outerjoin expressions

Information Systems
Efficient incremental maintenance of data cubes

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach

IEEE Transactions on Knowledge and Data Engineering
Efficient approaches for materialized views selection in a data warehouse

Information Sciences: an International Journal
Quotient cube: how to summarize the semantics of a data cube

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The polynomial complexity of fully materialized coalesced cubes

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Simultaneous determination of view selection and update policy with stochastic query and response time constraints

Information Sciences: an International Journal
What-if OLAP Queries with Changing Dimensions

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering

XML materialized views and schema evolution in VIREX

Information Sciences: an International Journal
Extracting semantics in OLAP databases using emerging cubes

Information Sciences: an International Journal
Efficient maintenance of basic statistical functions in data warehouses

Decision Support Systems

Quantified Score

Hi-index	0.07

Visualization

Abstract

The data cube operator computes group-bys for all possible combinations of a set of dimension attributes. Since computing a data cube typically incurs a considerable cost, the data cube is often precomputed and stored as materialized views in data warehouses. A materialized data cube needs to be updated when the source relations are changed. The incremental maintenance of a data cube is to compute and propagate only its changes, rather than recompute the entire data cube from scratch. For n dimension attributes, the data cube consists of 2^n group-bys, each of which is called a cuboid. To incrementally maintain a data cube with 2^n cuboids, the conventional methods compute 2^ndelta cuboids, each of which represents the change of a cuboid. In this paper, we propose an efficient incremental maintenance method that can maintain a data cube using only a subset of 2^n delta cuboids. We formulate an optimization problem to find the optimal subset of 2^n delta cuboids that minimizes the total maintenance cost, and propose a heuristic solution that allows us to maintain a data cube using only n@?n/2@? delta cuboids. As a result, the cost of maintaining a data cube is substantially reduced. Through various experiments, we show the performance advantages of the proposed method over the conventional methods. We also extend the proposed method to handle partially materialized cubes and dimension hierarchies.