CubiST++: Evaluating Ad-Hoc CUBE Queries Using Statistics Trees

Authors:
Joachim Hammer;Lixin Fu
Affiliations:
Computer & Information Science & Eng., University of Florida, Gainesville, FL 32611-6120, USA. jhammer@cise.ufl.edu;Division of Computer Science, University of North Carolina, Greensboro, Greensboro, NC 27402-6170, USA. lfu@uncg.edu
Venue:
Distributed and Parallel Databases
Year:
2003

Citing 23
Cited 2

Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Cubetree: organization of and bulk incremental updates on the data cube

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data warehousing and OLAP for decision support

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bitmap index design and evaluation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Consistency Algorithms for Multi-Source Warehouse View Maintenance

Distributed and Parallel Databases - Special issue on parallel and distributed information systems
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Direct spatial search on pictorial databases using packed R-trees

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
CubiST: a new algorithm for improving the performance of ad-hoc OLAP queries

Proceedings of the 3rd ACM international workshop on Data warehousing and OLAP
Ubiquitous B-Tree

ACM Computing Surveys (CSUR)
PARSIMONY: An infrastructure for parallel multidimensional analysis and data mining

Journal of Parallel and Distributed Computing - Special issue on high-performance data mining
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
High Performance OLAP and Data Mining on Parallel Computers

Data Mining and Knowledge Discovery
TBSAM: An Access Method for Efficient Processing of Statistical Queries

IEEE Transactions on Knowledge and Data Engineering
Modeling Multidimensional Databases

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Physical Database Design for Data Warehouses

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Model 204 Architecture and Performance

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
Eager Aggregation and Lazy Aggregation

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Information Retrieval from an Incomplete Data Cube

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Sorting improves word-aligned bitmap indexes

Data & Knowledge Engineering
A web visualization tool for historical analysis of geo-referenced multidimensional data

DASFAA'08 Proceedings of the 13th international conference on Database systems for advanced applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We report on a new, efficient encoding for the data cube, which results in a drastic speed-up of OLAP queries that aggregate along any combination of dimensions over numerical and categorical attributes. We are focusing on a class of queries called cube queries, which return aggregated values rather than sets of tuples. Our approach, termed CubiST++ (Cubing with Statistics Trees Plus Families), represents a drastic departure from existing relational (ROLAP) and multi-dimensional (MOLAP) approaches in that it does not use the view lattice to compute and materialize new views from existing views in some heuristic fashion. Instead, CubiST++ encodes all possible aggregate views in the leaves of a new data structure called statistics tree (ST) during a one-time scan of the detailed data. In order to optimize the queries involving constraints on hierarchy levels of the underlying dimensions, we select and materialize a family of candidate trees, which represent superviews over the different hierarchical levels of the dimensions. Given a query, our query evaluation algorithm selects the smallest tree in the family, which can provide the answer. Extensive evaluations of our prototype implementation have demonstrated its superior run-time performance and scalability when compared with existing MOLAP and ROLAP systems.