Attribute value reordering for efficient hybrid OLAP

Authors:
Owen Kaser;Daniel Lemire
Affiliations:
Department of Computer Science and Applied Statistics, University of New Brunswick, Saint John, NB, Canada E2L 4L5;Université du Québec á Montréal, Montréal, QC, Canada
Venue:
Information Sciences: an International Journal
Year:
2006

Citing 20
Cited 3

Handbook of theoretical computer science (vol. A): algorithms and complexity

Handbook of theoretical computer science (vol. A): algorithms and complexity
A catalog of complexity classes

Handbook of theoretical computer science (vol. A)
Graph algorithms

Handbook of theoretical computer science (vol. A)
Network flows: theory, algorithms, and applications

Network flows: theory, algorithms, and applications
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Approximate computation of multidimensional aggregates of sparse data using wavelets

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
An Efficient Implementation of Edmonds' Algorithm for Maximum Matching on Graphs

Journal of the ACM (JACM)
Towards the building of a dense-region-based OLAP system

Data & Knowledge Engineering
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Block-Oriented Compression Techniques for Large Statistical Databases

IEEE Transactions on Knowledge and Data Engineering
Efficient Aggregation Algorithms for Compressed Data Warehouses

IEEE Transactions on Knowledge and Data Engineering
Coarse Grained Parallel On-Line Analytical Processing (OLAP) for Data Mining

ICCS '01 Proceedings of the International Conference on Computational Science-Part II
Efficient Organization of Large Multidimensional Arrays

Proceedings of the Tenth International Conference on Data Engineering
Using Loglinear Models to Compress Datacube

WAIM '00 Proceedings of the First International Conference on Web-Age Information Management
DROLAP - A Dense-Region Based Approach to On-Line Analytical Processing

DEXA '99 Proceedings of the 10th International Conference on Database and Expert Systems Applications
pCube: Update-Efficient Online Aggregation with Progressive Feedback and Error Bounds

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
High-performance on-line analytical processing and data mining on parallel computers

High-performance on-line analytical processing and data mining on parallel computers
Attribute value reordering for efficient hybrid OLAP

DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
Compressing arrays by ordering attribute values

Information Processing Letters

Data warehouse enhancement: A semantic cube model approach

Information Sciences: an International Journal
Simultaneous determination of view selection and update policy with stochastic query and response time constraints

Information Sciences: an International Journal
Reordering columns for smaller indexes

Information Sciences: an International Journal

Quantified Score

Hi-index	0.07

Visualization

Abstract

The normalization of a data cube is the ordering of the attribute values. For large multidimensional arrays where dense and sparse chunks are stored differently, proper normalization can lead to improved storage efficiency. We show that it is NP-hard to compute an optimal normalization even for 1x3 chunks, although we find an exact algorithm for 1x2 chunks. When dimensions are nearly statistically independent, we show that dimension-wise attribute frequency sorting is an optimal normalization and takes time O(dnlog(n)) for data cubes of size n^d. When dimensions are not independent, we propose and evaluate a several heuristics. The hybrid OLAP (HOLAP) storage mechanism is already 19-30% more efficient than ROLAP, but normalization can improve it further by 9-13% for a total gain of 29-44% over ROLAP.