CDDTA-JOIN: one-pass OLAP algorithm for column-oriented databases

Authors:
Min Jiao;Yansong Zhang;Yan Sun;Shan Wang;Xuan Zhou
Affiliations:
DEKE Lab, Renmin University of China, Beijing, China and School of Information, Renmin University of China, Beijing, China;National Survey Research Center, Renmin University of China, Beijing, China;DEKE Lab, Renmin University of China, Beijing, China and School of Information, Renmin University of China, Beijing, China;DEKE Lab, Renmin University of China, Beijing, China and School of Information, Renmin University of China, Beijing, China;DEKE Lab, Renmin University of China, Beijing, China and School of Information, Renmin University of China, Beijing, China
Venue:
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Year:
2012

Citing 9
Cited 0

Data page layouts for relational databases on deep memory hierarchies

The VLDB Journal — The International Journal on Very Large Data Bases
Database Architecture Optimized for the New Bottleneck: Memory Access

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Data morphing: an adaptive, cache-conscious storage technique

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Sybase IQ multiplex - designed for analytics

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Column-stores vs. row-stores: how different are they really?

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Brighthouse: an analytic data warehouse for ad-hoc queries

Proceedings of the VLDB Endowment
DSM vs. NSM: CPU performance tradeoffs in block-oriented query processing

Proceedings of the 4th international workshop on Data management on new hardware
MOSS-DB: a hardware-aware OLAP database

WAIM'10 Proceedings of the 11th international conference on Web-age information management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Row-store commonly uses a volcano-style "once-a-tuple" pipeline processor for processing efficiency but looses the I/O efficiency when only a small part of columns are accessed in a wide table. The academic column-store usually uses "once-a-column" style processing for I/O and cache efficiency but it has to suffer multi-pass column scan for complex query. This paper focuses on how to achieve the maximal gains from storage models for both pipeline processing efficiency and column processing efficiency. Based on the "address-value" mapping for surrogate key in dimension table, we can map incremental primary keys as offset addresses, so the foreign keys in fact table can be utilized as native join index for dimensional tuples. We use predicate vector as bitmap vector filters for dimensions to enable star-join as pipeline operator and pre-generate hash aggregators for aggregat based on the column. Using these approaches, star-join and pre-grouping can be completed in one-pass scan on dimensional attributes in fact table, and the following aggregate column scanning responses for the sparse accessing aggregation. We can gain both I/O efficiency for vector processing and CPU efficiency for pipeline aggregating. We perform the experiments for both simulated algorithm based on the column and the commercial column-store database.