Data compression on a database system
Communications of the ACM
The design and implementation of INGRES
ACM Transactions on Database Systems (TODS)
The implementation and performance of compressed databases
ACM SIGMOD Record
Performing joins without decompression in a compressed database system
ACM SIGMOD Record
Compressing Relations and Indexes
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Dictionary-based order-preserving string compression
The VLDB Journal — The International Journal on Very Large Data Bases
C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Super-Scalar RAM-CPU Cache Compression
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Integrating compression and execution in column-oriented database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Performance tradeoffs in read-optimized databases
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
How to wring a table dry: entropy compression of relations and querying of compressed relations
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Sybase IQ multiplex - designed for analytics
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
RadixZip: linear time compression of token streams
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Read-optimized databases, in depth
Proceedings of the VLDB Endowment
Rose: compressed, log-structured replication
Proceedings of the VLDB Endowment
Brighthouse: an analytic data warehouse for ad-hoc queries
Proceedings of the VLDB Endowment
Rough Sets in Data Warehousing
RSCTC '08 Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing
Data warehouse technology by infobright
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Space-economical partial gram indices for exact substring matching
Proceedings of the 18th ACM conference on Information and knowledge management
SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units
Proceedings of the VLDB Endowment
Changing base without losing space
Proceedings of the forty-second ACM symposium on Theory of computing
FAST: fast architecture sensitive tree search on modern CPUs and GPUs
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Performance debugging of parallel compression on multicore machines
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II
Reordering columns for smaller indexes
Information Sciences: an International Journal
Designing fast architecture-sensitive tree search on modern multicore/many-core processors
ACM Transactions on Database Systems (TODS)
Foundations and Trends in Databases
U2SOD-DB: a database system to manage large-scale ubiquitous urban sensing origin-destination data
Proceedings of the ACM SIGKDD International Workshop on Urban Computing
Compacting transactional data in hybrid OLTP&OLAP databases
Proceedings of the VLDB Endowment
High-performance online spatial and temporal aggregations on multi-core CPUs and many-core GPUs
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Query-aware compression of join results
Proceedings of the 16th International Conference on Extending Database Technology
DB2 with BLU acceleration: so much more than just a column store
Proceedings of the VLDB Endowment
File recipe compression in data deduplication systems
FAST'13 Proceedings of the 11th USENIX conference on File and Storage Technologies
Hi-index | 0.00 |
Two trends are converging to make the CPU cost of a table scan a more important component of database performance. First, table scans are becoming a larger fraction of the query processing workload, and second, large memories and compression are making table scans CPU, rather than disk bandwidth, bound. Data warehouse systems have found that they can avoid the unpredictability of joins and indexing and achieve good performance by using massive parallel processing to perform scans over compressed vertical partitions of a denormalized schema. In this paper we present a study of how to make such scans faster by the use of a scan code generator that produces code tuned to the database schema, the compression dictionaries, the queries being evaluated and the target CPU architecture. We investigate a variety of compression formats and propose two novel optimizations: tuple length quantization and a field length lookup table, for efficiently processing variable length fields and tuples. We present a detailed experimental study of the performance of generated scans against these compression formats, and use this to explore the trade off between compression quality and scan speed. We also introduce new strategies for removing instruction-level dependencies and increasing instruction-level parallelism, allowing for greater exploitation of multi-issue processors.