Optimization in Data Cube System Design

Authors:
Edward Hung;David W. Cheung;Ben Kao
Affiliations:
Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong. ehung@cs.umd.edu;Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong. dcheung@csis.hku.hk;Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong. kao@csis.hku.hk
Venue:
Journal of Intelligent Information Systems
Year:
2004

Citing 16
Cited 1

Multi-table joins through bitmapped join indices

ACM SIGMOD Record
Incremental maintenance of views with duplicates

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Building the data warehouse (2nd ed.)

Building the data warehouse (2nd ed.)
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data warehousing and OLAP for decision support

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On the complexity of the view-selection problem

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Requirement-based data cube schema design

Proceedings of the eighth international conference on Information and knowledge management
Answering complex SQL queries using automatic summary tables

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
View selection for designing the global data warehouse

Data & Knowledge Engineering - Data warehousing
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Materialized View Selection for Multidimensional Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Automated Selection of Materialized Views and Indexes in SQL Databases

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Collective cubing platform towards definition and analysis of warehouse cubes

ICCCI'12 Proceedings of the 4th international conference on Computational Collective Intelligence: technologies and applications - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

The design of an OLAP system for supporting real-time queries is one of the major research issues. One approach is to use data cubes, which are materialized precomputed multidimensional views of data in a data warehouse. We can derive a set of data cubes to answer each frequently asked query directly. However, there are two practical problems: (1) the maintenance cost of the data cubes, and (2) the query cost to answer those queries. Maintaining a data cube requires disk storage and CPU computation, so the maintenance cost is related to the total size as well as the total number of data cubes materialized. In most cases, materializing all data cubes is impractical. The maintenance cost may be reduced by merging some data cubes. However, the resulting larger data cubes will increase the query cost of answering some queries. If the bounds on the maintenance cost and the query cost are too strict, we help the user decide which queries to be sacrificed and not taken into consideration. We have defined an optimization problem in data cube system design. Given a maintenance-cost bound, a query-cost bound and a set of frequently asked queries, it is necessary to determine a set of data cubes such that the system can answer a largest subset of the queries without violating the two bounds. This is an NP-hard problem. We propose approximate Greedy algorithms GR, 2GM and 2GMM, which are shown to be both effective and efficient by experiments done on a census data set and a forest-cover-type data set.