Requirement-based data cube schema design

  • Authors:
  • David W. Cheung;Bo Zhou;Ben Kao;Hongjun Lu;Tak Wah Lam;Hing Fung Ting

  • Affiliations:
  • Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong;Department of Computer Science and Engineering, Zhejiang University, Hangzhou, China;Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong;Department of Computer Science, The Hong Kong University of Science and Technology, Hong Kong;Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong;Department of Computer Science and Information Systems, The University of Hong Kong, Hong Kong

  • Venue:
  • Proceedings of the eighth international conference on Information and knowledge management
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

On-line analytical processing (OLAP) requires efficient processing of complex decision support queries over very large databases. It is well accepted that pre-computed data cubes can help reduce the response time of such queries dramatically. A very important design issue of an efficient OLAP system is therefore the choice of the right data cubes to materialize. We call this problem the data cube schema design problem. In this paper we show that the problem of finding an optimal data cube schema for an OLAP system with limited memory is NP-hard. As a more computationally efficient alternative, we propose a greedy approximation algorithm cMP and its variants. Algorithm cMP consists of two phases. In the first phase, an initial schema consisting of all the cubes required to efficiently answer the user queries is formed. In the second phase, cubes in the initial schema are selectively merged to satisfy the memory constraint. We show that cMP is very effective in pruning the search space for an optimal schema. This leads to a highly efficient algorithm. We report the efficiency and the effectiveness of cMP via an empirical study using the TPC-D benchmark. Our results show that the data cube schemas generated by cMP enable very efficient OLAP query processing.