C-store: a column-oriented DBMS
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating compression and execution in column-oriented database systems
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
COLT: continuous on-line tuning
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Performance tradeoffs in read-optimized databases
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
To tune or not to tune?: a lightweight physical design alerter
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Cache-conscious radix-decluster projections
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
DB2 design advisor: integrated automatic physical database design
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Database architecture evolution: mammals flourished long before dinosaurs became extinct
Proceedings of the VLDB Endowment
Probabilistic ranking over relations
Proceedings of the 13th International Conference on Extending Database Technology
Benchmarking adaptive indexing
TPCTC'10 Proceedings of the Second TPC technology conference on Performance evaluation, measurement and characterization of complex systems
Transactions on large-scale data- and knowledge-centered systems II
Transactions on large-scale data- and knowledge-centered systems II
Column-oriented storage techniques for MapReduce
Proceedings of the VLDB Endowment
Efficient processing of data warehousing queries in a split execution environment
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Merging what's cracked, cracking what's merged: adaptive indexing in main-memory column-stores
Proceedings of the VLDB Endowment
ANAPSID: an adaptive query processing engine for SPARQL endpoints
ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I
To cache or not to cache: the effects of warming cache in complex SPARQL queries
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
The database architectures research group at CWI
ACM SIGMOD Record
Stochastic database cracking: towards robust adaptive indexing in main-memory column-stores
Proceedings of the VLDB Endowment
Concurrency control for adaptive indexing
Proceedings of the VLDB Endowment
Holistic indexing: offline, online and adaptive indexing in the same kernel
PhD '12 Proceedings of the on SIGMOD/PODS 2012 PhD Symposium
NoDB: efficient query execution on raw data files
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Clydesdale: structured data processing on MapReduce
Proceedings of the 15th International Conference on Extending Database Technology
Adaptive indexing in modern database kernels
Proceedings of the 15th International Conference on Extending Database Technology
Query optimization with value path materialization in column-stored DWMS
Proceedings of the 3rd International Conference on Computing for Geospatial Research and Applications
Processing a trillion cells per mouse click
Proceedings of the VLDB Endowment
MonetDB/DataCell: online analytics in a streaming column-store
Proceedings of the VLDB Endowment
A positional access method for relational databases
Proceedings of the 21st ACM international conference on Information and knowledge management
Invisible loading: access-driven data transfer from raw files into database systems
Proceedings of the 16th International Conference on Extending Database Technology
Enhanced stream processing in a DBMS kernel
Proceedings of the 16th International Conference on Extending Database Technology
Hi-index | 0.00 |
Column-stores gained popularity as a promising physical design alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tuple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately organized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates. In this paper, we propose a novel design, partial sideways cracking, that minimizes the tuple reconstruction cost in a self-organizing way. It achieves performance similar to using presorted data, but without requiring the heavy initial presorting step itself. Instead, it handles dynamic, unpredictable workloads with no idle time and frequent updates. Auxiliary dynamic data structures, called cracker maps, provide a direct mapping between pairs of attributes used together in queries for tuple reconstruction. A map is continuously physically reorganized as an integral part of query evaluation, providing faster and reduced data access for future queries. To enable flexible and self-organizing behavior in storage-limited environments, maps are materialized only partially as demanded by the workload. Each map is a collection of separate chunks that are individually reorganized, dropped or recreated as needed. We implemented partial sideways cracking in an open-source column-store. A detailed experimental analysis demonstrates that it brings significant performance benefits for multi-attribute queries.