Database Compression Using an Offline Dictionary Method

Authors:
Abu Sayed Md. Latiful Hoque;Douglas R. McGregor;John Wilson
Affiliations:
-;-;-
Venue:
ADVIS '02 Proceedings of the Second International Conference on Advances in Information Systems
Year:
2002

Citing 9
Cited 0

Data compression on a database system

Communications of the ACM
Data compression: the complete reference

Data compression: the complete reference
A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
Data compression via textual substitution

Journal of the ACM (JACM)
Experiments in text file compression

Communications of the ACM
The implementation and performance of compressed databases

ACM SIGMOD Record
Compressing Relations and Indexes

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Data Compression in Database Systems

IDEAS '98 Proceedings of the 1998 International Symposium on Database Engineering & Applications
A Technique for High-Performance Data Compression

Computer

Quantified Score

Hi-index	0.00

Visualization

Abstract

Off-line dictionary compression is becoming more attractive for applications where compressed data are searched directly in compressed form. While there has been large body of related work describing specific database compression algorithms, the Hibase [10] architecture is unique in processing queries in compressed data. However, this technique does not compress the representation of strings in the domain dictionaries. Primary keys, data with high cardinality and semi-structured data contribute very little or no compression. To achieve high performance irrespective of type of data, the string representation must be in compressed form. At the same time, the direct addressability of compressed data is maintained. Serial compression techniques cannot be used. In this paper, we present a prefix dictionary-based off-line method that can be incorporated with systems like Hibase where compressed data can be accessed directly without prior decompression. The complexity is O(n) in time and space.