A hilbert space compression architecture for data warehouse environments

Authors:
Todd Eavis;David Cueva
Affiliations:
Concordia University, Montreal, Canada;Concordia University, Montreal, Canada
Venue:
DaWaK'07 Proceedings of the 9th international conference on Data Warehousing and Knowledge Discovery
Year:
2007

Citing 14
Cited 5

Fractals for secondary key retrieval

PODS '89 Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Linear clustering of objects with multiple attributes

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Multidimensional access methods

ACM Computing Surveys (CSUR)
The implementation and performance of compressed databases

ACM SIGMOD Record
Dwarf: shrinking the PetaCube

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Block-Oriented Compression Techniques for Large Statistical Databases

IEEE Transactions on Knowledge and Data Engineering
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve

IEEE Transactions on Knowledge and Data Engineering
STR: A Simple and Efficient Algorithm for R-Tree Packing

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Total

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Compressing Relations and Indexes

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Generalized kraft inequality and arithmetic coding

IBM Journal of Research and Development
Run-length encodings (Corresp.)

IEEE Transactions on Information Theory
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory

History offset implementation scheme for large scale multidimensional data sets

Proceedings of the 2008 ACM symposium on Applied computing
Parallel OLAP with the Sidera server

Future Generation Computer Systems
Sidera: a cluster-based server for online analytical processing

OTM'07 Proceedings of the 2007 OTM confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part II
Reordering columns for smaller indexes

Information Sciences: an International Journal
Reordering rows for better compression: Beyond the lexicographic order

ACM Transactions on Database Systems (TODS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multi-dimensional data sets are very common in areas such as data warehousing and statistical databases. In these environments, core tables often grow to enormous sizes. In order to reduce storage requirements, and therefore to permit the retention of even larger data sets, compression methods are an attractive option. In this paper we discuss an efficient compression framework that is specifically designed for very large relational database implementations. The primary methods exploit a Hilbert space filling curve to dramatically reduce the storage footprint for the underlying tables. Tuples are individually compressed into page sized units so that only blocks relevant to the user's multidimensional query need be accessed. Compression is available not only for the relational tables themselves, but also for the associated r-tree indexes. Experimental results demonstrate compression rates of more than 90% for multi-dimensional data, and up to 98% for the indexes.