A locally adaptive data compression scheme
Communications of the ACM
Arithmetic coding for data compression
Communications of the ACM
Compression of Low Entropy Strings with Lempel--Ziv Algorithms
SIAM Journal on Computing
Engineering the compression of massive tables: an experimental approach
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Universal Data Compression Based on the Burrows-Wheeler Transformation: Theory and Practice
IEEE Transactions on Computers
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Towards Compressing Web Graphs
DCC '01 Proceedings of the Data Compression Conference
Improving table compression with combinatorial optimization
Journal of the ACM (JACM)
Using Column Dependency to Compress Tables
DCC '04 Proceedings of the Conference on Data Compression
Boosting textual compression in optimal linear time
Journal of the ACM (JACM)
A Mathematical Theory of Communication
A Mathematical Theory of Communication
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding
IEEE Transactions on Information Theory
RadixZip: linear time compression of token streams
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Reordering columns for smaller indexes
Information Sciences: an International Journal
Hi-index | 5.23 |
Tables are two-dimensional arrays given in row-major order. Such data have unique features that could be exploited for effective compression. For example, tables often represent database files with rows as records so certain columns or fields in a table may have few distinct values. This means that simply transposing the data can make it compress better. Further, a large source of information redundancy in a table is the correlation among columns representing related types of data. This paper formalizes the notion of column dependency as a way to capture this information redundancy across columns and discusses how to automatically compute and use it to substantially improve table compression.