Dwarfs in the rearview mirror: how big are they really?

Authors:
Jens Dittrich;Lukas Blunschi;Marcos Antonio Vaz Salles
Affiliations:
ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 20
Cited 5

Quickly generating billion-record synthetic databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Research problems in data warehousing

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
“One size fits all” database architectures do not work for DSS

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
An overview of data warehousing and OLAP technology

ACM SIGMOD Record
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Bottom-up computation of sparse and Iceberg CUBE

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
On searching transposed files

ACM Transactions on Database Systems (TODS)
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Teaching an OLTP Database Kernel Advanced Data Warehousing Techniques

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Index Selection for OLAP

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Fast Computation of Sparse Datacubes

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
QC-trees: an efficient summary structure for semantic OLAP

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Hierarchical dwarfs for the rollup cube

DOLAP '03 Proceedings of the 6th ACM international workshop on Data warehousing and OLAP
Fast Computation of Iceberg Dwarf

SSDBM '04 Proceedings of the 16th International Conference on Scientific and Statistical Database Management
Bridging the gap between OLAP and SQL

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Data mining with the SAP NetWeaver BI accelerator

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The polynomial complexity of fully materialized coalesced cubes

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
PMC: select materialized cells in data cubes

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery

Efficient updates for a shared nothing analytics platform

Proceedings of the 2010 Workshop on Massive Data Analytics on the Cloud
Distributing the power of OLAP

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Subspace similarity search: efficient k-NN queries in arbitrary subspaces

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Brown Dwarf: A fully-distributed, fault-tolerant data warehousing system

Journal of Parallel and Distributed Computing
Topological XML data cube construction

International Journal of Web Engineering and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Online-Analytical Processing (OLAP) has been a field of competing technologies for the past ten years. One of the still unsolved challenges of OLAP is how to provide quick response times on any Terabyte-sized business data problem. Recently, a very clever multi-dimensional index structure termed Dwarf [26] has been proposed offering excellent query response times as well as unmatched index compression rates. The proposed index seems to scale well for both large data sets as well as high dimensions. Motivated by these surprisingly excellent results, we take a look into the rearview mirror. We have re-implemented the Dwarf index from scratch and make three contributions. First, we successfully repeat several of the experiments of the original paper. Second, we substantially correct some of the experimental results reported by the inventors. Some of our results differ by orders of magnitude. To better understand these differences, we provide additional experiments that better explain the behavior of the Dwarf index. Third, we provide missing experiments comparing Dwarf to baseline query processing strategies. This should give practitioners a better guideline to understand for which cases Dwarf indexes could be useful in practice.