ACM Transactions on Database Systems (TODS)
Communications of the ACM
A new technique for compression and storage of data
Communications of the ACM
A heuristic approach to attribute partitioning
SIGMOD '79 Proceedings of the 1979 ACM SIGMOD international conference on Management of data
Hi-index | 0.00 |
In this paper we explore the compression of large statistical databases and propose techniques for organizing the compressed data, such that the time required to access the data is logarithmic. Our techniques are variations of run-length encoding, in which modified run-lengths for the series are extracted from the data stream and stored in a header, which is used to form the base level of a B-tree index into the database. The run-lengths are cumulative, and therefore the access time of the data is logarithmic in the size of the header. We discuss the details of the compression scheme and its implementation, present several special cases and give an analysis of the relative performance of the various versions.