Efficient Online Aggregates in Dense-Region-Based Data Cube Representations

Authors:
Kais Haddadin;Tobias Lauer
Affiliations:
Jedox AG, Freiburg, Germany;Institute of Computer Science, University of Freiburg, Germany
Venue:
DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Year:
2009

Citing 14
Cited 2

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Range queries in OLAP data cubes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Towards the building of a dense-region-based OLAP system

Data & Knowledge Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Index Selection for OLAP

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Flexible Data Cubes for Online Aggregation

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Relative Prefix Sums: An Efficient Approach for Querying Dynamic OLAP Data Cubes

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Space-efficient cubes for OLAP range-sum queries

Decision Support Systems
Efficient Range-Sum Queries along Dimensional Hierarchies in Data Cubes

DBKDA '09 Proceedings of the 2009 First International Conference on Advances in Databases, Knowledge, and Data Applications
An effective algorithm to extract dense sub-cubes from a large sparse cube

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Evaluation of top-k OLAP queries using aggregate r–trees

SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases

Efficient online aggregates in dense-region-based data cube representations

Transactions on large-scale data- and knowledge-centered systems II
Efficient online aggregates in dense-region-based data cube representations

Transactions on large-scale data- and knowledge-centered systems II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In-memory OLAP systems require a space-efficient representation of sparse data cubes in order to accommodate large data sets. On the other hand, most efficient online aggregation techniques, such as prefix sums, are built on dense array-based representations. These are often not applicable to real-world data due to the size of the arrays which usually cannot be compressed well, as most sparsity is removed during pre-processing. A possible solution is to identify dense regions in a sparse cube and only represent those using arrays, while storing sparse data separately, e.g. in a spatial index structure. Previous dense-region-based approaches have concentrated mainly on the effectiveness of the dense-region detection (i.e. on the space-efficiency of the result). However, especially in higher-dimensional cubes, data is usually more cluttered, resulting in a potentially large number of small dense regions, which negatively affects query performance on such a structure. In this paper, our focus is not only on space-efficiency but also on time-efficiency, both for the initial dense-region extraction and for queries carried out in the resulting hybrid data structure. We describe two methods to trade available memory for increased aggregate query performance. In addition, optimizations in our approach significantly reduce the time to build the initial data structure compared to former systems. Also, we present a straightforward adaptation of our approach to support multi-core or multi-processor architectures, which can further enhance query performance. Experiments with different real-world data sets show how various parameter settings can be used to adjust the efficiency and effectiveness of our algorithms.