How to barter bits for chronons: compression and bandwidth trade offs for database scans

Authors:
Allison L. Holloway;Vijayshankar Raman;Garret Swart;David J. DeWitt
Affiliations:
University of Wisconsin, Madison, WI;IBM Almaden Research Center, San Jose, CA;IBM Almaden Research Center, San Jose, CA;University of Wisconsin, Madison, WI
Venue:
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Year:
2007

Citing 13
Cited 20

Data compression on a database system

Communications of the ACM
The design and implementation of INGRES

ACM Transactions on Database Systems (TODS)
The implementation and performance of compressed databases

ACM SIGMOD Record
Performing joins without decompression in a compressed database system

ACM SIGMOD Record
Compressing Relations and Indexes

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Dictionary-based order-preserving string compression

The VLDB Journal — The International Journal on Very Large Data Bases
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Super-Scalar RAM-CPU Cache Compression

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Performance tradeoffs in read-optimized databases

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
How to wring a table dry: entropy compression of relations and querying of compressed relations

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Data compression in Oracle

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Sybase IQ multiplex - designed for analytics

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

RadixZip: linear time compression of token streams

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Read-optimized databases, in depth

Proceedings of the VLDB Endowment
Rose: compressed, log-structured replication

Proceedings of the VLDB Endowment
Brighthouse: an analytic data warehouse for ad-hoc queries

Proceedings of the VLDB Endowment
Rough Sets in Data Warehousing

RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Data warehouse technology by infobright

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Space-economical partial gram indices for exact substring matching

Proceedings of the 18th ACM conference on Information and knowledge management
SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units

Proceedings of the VLDB Endowment
Changing base without losing space

Proceedings of the forty-second ACM symposium on Theory of computing
FAST: fast architecture sensitive tree search on modern CPUs and GPUs

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Performance debugging of parallel compression on multicore machines

PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II
Reordering columns for smaller indexes

Information Sciences: an International Journal
Designing fast architecture-sensitive tree search on modern multicore/many-core processors

ACM Transactions on Database Systems (TODS)
Modern B-Tree Techniques

Foundations and Trends in Databases
U2SOD-DB: a database system to manage large-scale ubiquitous urban sensing origin-destination data

Proceedings of the ACM SIGKDD International Workshop on Urban Computing
Compacting transactional data in hybrid OLTP&OLAP databases

Proceedings of the VLDB Endowment
High-performance online spatial and temporal aggregations on multi-core CPUs and many-core GPUs

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Query-aware compression of join results

Proceedings of the 16th International Conference on Extending Database Technology
DB2 with BLU acceleration: so much more than just a column store

Proceedings of the VLDB Endowment
File recipe compression in data deduplication systems

FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

Two trends are converging to make the CPU cost of a table scan a more important component of database performance. First, table scans are becoming a larger fraction of the query processing workload, and second, large memories and compression are making table scans CPU, rather than disk bandwidth, bound. Data warehouse systems have found that they can avoid the unpredictability of joins and indexing and achieve good performance by using massive parallel processing to perform scans over compressed vertical partitions of a denormalized schema. In this paper we present a study of how to make such scans faster by the use of a scan code generator that produces code tuned to the database schema, the compression dictionaries, the queries being evaluated and the target CPU architecture. We investigate a variety of compression formats and propose two novel optimizations: tuple length quantization and a field length lookup table, for efficiently processing variable length fields and tuples. We present a detailed experimental study of the performance of generated scans against these compression formats, and use this to explore the trade off between compression quality and scan speed. We also introduce new strategies for removing instruction-level dependencies and increasing instruction-level parallelism, allowing for greater exploitation of multi-issue processors.