High-dimensional OLAP: a minimal cubing approach

Authors:
Xiaolei Li;Jiawei Han;Hector Gonzalez
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Year:
2004

Citing 20
Cited 32

Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Quasi-cubes: exploiting approximations in multidimensional databases

ACM SIGMOD Record
Bitmap index design and evaluation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Efficient computation of Iceberg cubes with complex measures

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Modern Information Retrieval

Modern Information Retrieval
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals

Data Mining and Knowledge Discovery
Encoded Bitmap Indexing for Data Warehouses

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Optimizing Queries on Compressed Bitmaps

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
QC-trees: an efficient summary structure for semantic OLAP

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Quotient cube: how to summarize the semantics of a data cube

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Star-cubing: computing iceberg cubes by top-down and bottom-up integration

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Stream Cube: An Architecture for Multi-Dimensional Analysis of Data Streams

Distributed and Parallel Databases
CURE for cubes: cubing using a ROLAP engine

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Answering top-k queries with multi-dimensional selections: the ranking cube approach

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach

IEEE Transactions on Knowledge and Data Engineering
Mining approximate top-k subspace anomalies in multi-dimensional time-series data

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
DataScope: viewing database contents in Google Maps' way

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
ARCube: supporting ranking aggregate queries in partially materialized data cubes

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
OLAP on sequence data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Sampling cube: a framework for statistical olap over sampling data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Supporting the data cube lifecycle: the power of ROLAP

The VLDB Journal — The International Journal on Very Large Data Bases
Bellwether analysis: Searching for cost-effective query-defined predictors in large databases

ACM Transactions on Knowledge Discovery from Data (TKDD)
Computing data cubes without redundant aggregated nodes and single graph paths: the sequential MCG approach

SBBD '08 Proceedings of the 23rd Brazilian symposium on Databases
A view selection algorithm with performance guarantee

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
A Multiple Correspondence Analysis to Organize Data Cubes

Proceedings of the 2007 conference on Databases and Information Systems IV: Selected Papers from the Seventh International Baltic Conference DB&IS'2006
CAMS: OLAPing Multidimensional Data Streams Efficiently

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
BitCube: A Bottom-Up Cubing Engineering

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Mining significant change patterns in multidimensional spaces

International Journal of Business Intelligence and Data Mining
Graph OLAP: a multi-dimensional framework for graph data analysis

Knowledge and Information Systems
Mining multi-dimensional frequent patterns without data cube construction

PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
PHC: a rapid parallel hierarchical cubing algorithm on high dimensional OLAP

ICA3PP'07 Proceedings of the 7th international conference on Algorithms and architectures for parallel processing
A high performance hierarchical cubing algorithm and efficient OLAP in high-dimensional data warehouse

PAKDD'07 Proceedings of the 2007 international conference on Emerging technologies in knowledge discovery and data mining
S-OLAP: an OLAP system for analyzing sequence data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Multidimensional cyclic graph approach: Representing a data cube without common sub-graphs

Information Sciences: an International Journal
Differentially private data cubes: optimizing noise sources and consistency

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Graph cube: on warehousing and OLAP multidimensional networks

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
MOLAP cube based on parallel scan algorithm

ADBIS'11 Proceedings of the 15th international conference on Advances in databases and information systems
Ag-Tree: a novel structure for range queries in data warehouse environments

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
An efficient indexing technique for computing high dimensional data cubes

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Dynamic construction of user defined virtual cubes

NGITS'06 Proceedings of the 6th international conference on Next Generation Information Technologies and Systems
HMGraph OLAP: a novel framework for multi-dimensional heterogeneous network analysis

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Mining top-K multidimensional gradients

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery

Quantified Score

Hi-index	0.01

Visualization

Abstract

Data cube has been playing an essential role in fast OLAP (online analytical processing) in many multi-dimensional data warehouses. However, there exist data sets in applications like bioinformatics, statistics, and text processing that are characterized by high dimensionality, e.g., over 100 dimensions, and moderate size, e.g., around 106 tuples. No feasible data cube can be constructed with such data sets. In this paper we will address the problem of developing an efficient algorithm to perform OLAP on such data sets. Experience tells us that although data analysis tasks may involve a high dimensional space, most OLAP operations are performed only on a small number of dimensions at a time. Based on this observation, we propose a novel method that computes a thin layer of the data cube together with associated value-list indices. This layer, while being manageable in size, will be capable of supporting flexible and fast OLAP operations in the original high dimensional space. Through experiments we will show that the method has I/O costs that scale nicely with dimensionality. Furthermore, the costs are comparable to that of accessing an existing data cube when full materialization is possible.