Data compression using dynamic Markov modelling
The Computer Journal
Elements of information theory
Elements of information theory
Data compression on a database system
Communications of the ACM
The connectivity server: fast access to linkage information on the Web
WWW7 Proceedings of the seventh international conference on World Wide Web 7
SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
SPARTAN: a model-based semantic compression system for massive data tables
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information
The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information
Compressing Relations and Indexes
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Weaving Relations for Cache Performance
Proceedings of the 27th International Conference on Very Large Data Bases
Data Compression Support in Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Dictionary-based order-preserving string compression
The VLDB Journal — The International Journal on Very Large Data Bases
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
How to barter bits for chronons: compression and bandwidth trade offs for database scans
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
RadixZip: linear time compression of token streams
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Read-Optimized, Cache-Conscious, Page Layouts for Temporal Relational Data
DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Read-optimized databases, in depth
Proceedings of the VLDB Endowment
Rose: compressed, log-structured replication
Proceedings of the VLDB Endowment
Main-memory scan sharing for multi-core CPUs
Proceedings of the VLDB Endowment
Row-wise parallel predicate evaluation
Proceedings of the VLDB Endowment
Architecture of a Database System
Foundations and Trends in Databases
Efficient index compression in DB2 LUW
Proceedings of the VLDB Endowment
Changing base without losing space
Proceedings of the forty-second ACM symposium on Theory of computing
Fast integer compression using SIMD instructions
Proceedings of the Sixth International Workshop on Data Management on New Hardware
Speeding up queries in column stores: a case for compression
DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Cheetah: a high performance, custom data warehouse on top of MapReduce
Proceedings of the VLDB Endowment
Foundations and Trends in Databases
Query-aware compression of join results
Proceedings of the 16th International Conference on Extending Database Technology
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
We present a method to compress relations close to their entropy while still allowing efficient queries. Column values are encoded into variable length codes to exploit skew in their frequencies. The codes in each tuple are concatenated and the resulting tuplecodes are sorted and delta-coded to exploit the lack of ordering in a relation. Correlation is exploited either by co-coding correlated columns, or by using a sort order that leverages the correlation. We prove that this method leads to near-optimal compression (within 4.3 bits/tuple of entropy), and in practice, we obtain up to a 40 fold compression ratio on vertical partitions tuned for TPC-H queries.We also describe initial investigations into efficient querying over compressed data. We present a novel Huffman coding scheme, called segregated coding, that allows range and equality predicates on compressed data, without accessing the full dictionary. We also exploit the delta coding to speed up scans, by reusing computations performed on nearly identical records. Initial results from a prototype suggest that with these optimizations, we can efficiently scan, tokenize and apply predicates on compressed relations.