Self-organizing tuple reconstruction in column-stores

Authors:
Stratos Idreos;Martin L. Kersten;Stefan Manegold
Affiliations:
CWI, Amsterdam, Netherlands;CWI, Amsterdam, Netherlands;CWI, Amsterdam, Netherlands
Venue:
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Year:
2009

Citing 8
Cited 23

C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
COLT: continuous on-line tuning

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Performance tradeoffs in read-optimized databases

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
To tune or not to tune?: a lightweight physical design alerter

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Updating a cracked database

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Cache-conscious radix-decluster projections

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
DB2 design advisor: integrated automatic physical database design

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Database architecture evolution: mammals flourished long before dinosaurs became extinct

Proceedings of the VLDB Endowment
Probabilistic ranking over relations

Proceedings of the 13th International Conference on Extending Database Technology
Benchmarking adaptive indexing

TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Fast loads and queries

Transactions on large-scale data- and knowledge-centered systems II
Fast loads and queries

Transactions on large-scale data- and knowledge-centered systems II
Column-oriented storage techniques for MapReduce

Proceedings of the VLDB Endowment
Efficient processing of data warehousing queries in a split execution environment

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Merging what's cracked, cracking what's merged: adaptive indexing in main-memory column-stores

Proceedings of the VLDB Endowment
ANAPSID: an adaptive query processing engine for SPARQL endpoints

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
To cache or not to cache: the effects of warming cache in complex SPARQL queries

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
The database architectures research group at CWI

ACM SIGMOD Record
Stochastic database cracking: towards robust adaptive indexing in main-memory column-stores

Proceedings of the VLDB Endowment
Concurrency control for adaptive indexing

Proceedings of the VLDB Endowment
Holistic indexing: offline, online and adaptive indexing in the same kernel

PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
NoDB: efficient query execution on raw data files

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Clydesdale: structured data processing on MapReduce

Proceedings of the 15th International Conference on Extending Database Technology
Adaptive indexing in modern database kernels

Proceedings of the 15th International Conference on Extending Database Technology
Query optimization with value path materialization in column-stored DWMS

Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applications
Processing a trillion cells per mouse click

Proceedings of the VLDB Endowment
MonetDB/DataCell: online analytics in a streaming column-store

Proceedings of the VLDB Endowment
A positional access method for relational databases

Proceedings of the 21st ACM international conference on Information and knowledge management
Invisible loading: access-driven data transfer from raw files into database systems

Proceedings of the 16th International Conference on Extending Database Technology
Enhanced stream processing in a DBMS kernel

Proceedings of the 16th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Column-stores gained popularity as a promising physical design alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tuple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately organized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates. In this paper, we propose a novel design, partial sideways cracking, that minimizes the tuple reconstruction cost in a self-organizing way. It achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself. Instead, it handles dynamic, unpredictable workloads with no idle time and frequent updates. Auxiliary dynamic data structures, called cracker maps, provide a direct mapping between pairs of attributes used together in queries for tuple reconstruction. A map is continuously physically reorganized as an integral part of query evaluation, providing faster and reduced data access for future queries. To enable flexible and self-organizing behavior in storage-limited environments, maps are materialized only partially as demanded by the workload. Each map is a collection of separate chunks that are individually reorganized, dropped or recreated as needed. We implemented partial sideways cracking in an open-source column-store. A detailed experimental analysis demonstrates that it brings significant performance benefits for multi-attribute queries.