Dwarf: shrinking the PetaCube

Authors:
Yannis Sismanis;Antonios Deligiannakis;Nick Roussopoulos;Yannis Kotidis
Affiliations:
University of Maryland, College Park;University of Maryland, College Park;University of Maryland, College Park;AT&T Labs-Research
Venue:
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Year:
2002

Citing 19
Cited 67

Implementing data cubes efficiently

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Cubetree: organization of and bulk incremental updates on the data cube

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An array-based algorithm for simultaneous multidimensional aggregates

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Online aggregation

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An alternative storage organization for ROLAP aggregate views based on cubetrees

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
New sampling-based summary statistics for improving approximate query answers

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data cube approximation and histograms via wavelets

Proceedings of the seventh international conference on Information and knowledge management
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Compressed data cubes for OLAP aggregate query approximation on continuous dimensions

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Congressional samples for approximate answering of group-by queries

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
CubiST: a new algorithm for improving the performance of ad-hoc OLAP queries

Proceedings of the 3rd ACM international workshop on Data warehousing and OLAP
Index Selection for OLAP

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Selection of Views to Materialize in a Data Warehouse

ICDT '97 Proceedings of the 6th International Conference on Database Theory
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Materialized Views Selection in a Multidimensional Database

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Materialized View Selection for Multidimensional Datasets

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Computing Iceberg Queries Efficiently

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
On the Computation of Multidimensional Aggregates

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

Spreadsheets in RDBMS for OLAP

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
QC-trees: an efficient summary structure for semantic OLAP

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Attribute value reordering for efficient hybrid OLAP

DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
Hierarchical dwarfs for the rollup cube

DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
Range CUBE: Efficient Cube Computation by Exploiting Data Correlation

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Incremental maintenance of quotient cube for median

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Incremental maintenance of quotient cube based on Galois lattice

Journal of Computer Science and Technology
PrefixCube: prefix-sharing condensed data cube

Proceedings of the 7th ACM international workshop on Data warehousing and OLAP
On-demand data broadcasting for mobile decision making

Mobile Networks and Applications
Advanced SQL modeling in RDBMS

ACM Transactions on Database Systems (TODS) - Special Issue: SIGMOD/PODS 2003
MDL summarization with holes

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Communication and Memory Optimal Parallel Data Cube Construction

IEEE Transactions on Parallel and Distributed Systems
The cgmCUBE project: Optimizing parallel data cube generation for ROLAP

Distributed and Parallel Databases
DADA: a data cube for dominant relationship analysis

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Semi-closed cube: an effective approach to trading off data cube size and query response time

Journal of Computer Science and Technology
CURE for cubes: cubing using a ROLAP engine

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
GORDIAN: efficient and scalable discovery of composite keys

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
New Algorithm for Computing Cube on Very Large Compressed Data Sets

IEEE Transactions on Knowledge and Data Engineering
Computing Iceberg Cubes by Top-Down and Bottom-Up Integration: The StarCubing Approach

IEEE Transactions on Knowledge and Data Engineering
Answering ad hoc aggregate queries from data streams using prefix aggregate trees

Knowledge and Information Systems
Quotient cube: how to summarize the semantics of a data cube

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
ROLAP implementations of the data cube

ACM Computing Surveys (CSUR)
Star-cubing: computing iceberg cubes by top-down and bottom-up integration

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
High-dimensional OLAP: a minimal cubing approach

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
The polynomial complexity of fully materialized coalesced cubes

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Efficient computation of view subsets

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
PnP: sequential, external memory, and parallel iceberg cube computation

Distributed and Parallel Databases
Why go logarithmic if we can go linear?: Towards effective distinct counting of search traffic

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
ARCube: supporting ranking aggregate queries in partially materialized data cubes

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Supporting the data cube lifecycle: the power of ROLAP

The VLDB Journal — The International Journal on Very Large Data Bases
Hierarchical clustering for OLAP: the CUBE File approach

The VLDB Journal — The International Journal on Very Large Data Bases
A Summary Structure of Data Cube Preserving Semantics

RSEISP '07 Proceedings of the international conference on Rough Sets and Intelligent Systems Paradigms
Approximate Range-Sum Queries over Data Cubes Using Cosine Transform

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Dwarfs in the rearview mirror: how big are they really?

Proceedings of the VLDB Endowment
Efficient OLAP with UDFs

Proceedings of the ACM 11th international workshop on Data warehousing and OLAP
FCLOS: A client-server architecture for mobile OLAP

Data & Knowledge Engineering
LCS-Hist: taming massive high-dimensional data cube compression

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Computing data cubes using exact sub-graph matching: the sequential MCG approach

Proceedings of the 2009 ACM symposium on Applied Computing
A Multiple Correspondence Analysis to Organize Data Cubes

Proceedings of the 2007 conference on Databases and Information Systems IV: Selected Papers from the Seventh International Baltic Conference DB&IS'2006
Data mining-based materialized view and index selection in data warehouses

Journal of Intelligent Information Systems
Closed Non Derivable Data Cubes Based on Non Derivable Minimal Generators

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
CCBitmaps: A Space-Time Efficient Index Structure for OLAP

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Compressing multidimensional structures: a case study

ECC'09 Proceedings of the 3rd international conference on European computing conference
An efficient method for maintaining data cubes incrementally

Information Sciences: an International Journal
Revisiting the cube lifecycle in the presence of hierarchies

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient updates for a shared nothing analytics platform

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Distributing the power of OLAP

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Brown dwarf: a P2P data-warehousing system

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Business intelligence for small and middle-sized entreprises

ACM SIGMOD Record
Multidimensional cyclic graph approach: Representing a data cube without common sub-graphs

Information Sciences: an International Journal
Brown Dwarf: A fully-distributed, fault-tolerant data warehousing system

Journal of Parallel and Distributed Computing
Adapting OLAP analysis to the user's interest through virtual cubes

FSKD'06 Proceedings of the Third international conference on Fuzzy Systems and Knowledge Discovery
Parallel data cubes on multi-core processors with multiple disks

Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research
Computing iceberg quotient cubes with bounding

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
An effective algorithm to extract dense sub-cubes from a large sparse cube

DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
PMC: select materialized cells in data cubes

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
HQC: an efficient method for ROLAP with hierarchical dimensions

RSFDGrC'05 Proceedings of the 10th international conference on Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - Volume Part II
An efficient indexing technique for computing high dimensional data cubes

WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Clustering-based materialized view selection in data warehouses

ADBIS'06 Proceedings of the 10th East European conference on Advances in Databases and Information Systems
Attribute value reordering for efficient hybrid OLAP

Information Sciences: an International Journal
Top-k interesting phrase mining in ad-hoc collections using sequence pattern indexing

Proceedings of the 15th International Conference on Extending Database Technology
Towards a scalable, performance-oriented OLAP storage engine

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part II
A hilbert space compression architecture for data warehouse environments

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
A clustered Dwarf structure to speed up queries on data cubes

DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Optimizing OLAP cube processing on solid state drives

Proceedings of the sixteenth international workshop on Data warehousing and OLAP
Efficient error-tolerant query autocompletion

Proceedings of the VLDB Endowment
Topological XML data cube construction

International Journal of Web Engineering and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dwarf is a highly compressed structure for computing, storing, and querying data cubes. Dwarf identifies prefix and suffix structural redundancies and factors them out by coalescing their store. Prefix redundancy is high on dense areas of cubes but suffix redundancy is significantly higher for sparse areas. Putting the two together fuses the exponential sizes of high dimensional full cubes into a dramatically condensed data structure. The elimination of suffix redundancy has an equally dramatic reduction in the computation of the cube because recomputation of the redundant suffixes is avoided. This effect is multiplied in the presence of correlation amongst attributes in the cube. A Petabyte 25-dimensional cube was shrunk this way to a 2.3GB Dwarf Cube, in less than 20 minutes, a 1:400000 storage reduction ratio. Still, Dwarf provides 100% precision on cube queries and is a self-sufficient structure which requires no access to the fact table. What makes Dwarf practical is the automatic discovery,in a single pass over the fact table, of the prefix and suffix redundancies without user involvement or knowledge of the value distributions.This paper describes the Dwarf structure and the Dwarf cube construction algorithm. Further optimizations are then introduced for improving clustering and query performance. Experiments with the current implementation include comparisons on detailed measurements with real and synthetic datasets against previously published techniques. The comparisons show that Dwarfs by far out-perform these techniques on all counts: storage space, creation time, query response time, and updates of cubes.