Join directly on heavy-weight compressed data in column-oriented database

Authors:
Gan Liang;Li RunHeng;Jia Yan;Jin Xin
Affiliations:
School of Computer Science, Nation University of Defense Technology, HuNan, China;School of Computer Science, Nation University of Defense Technology, HuNan, China;School of Computer Science, Nation University of Defense Technology, HuNan, China;School of Software, ChangSha Social Work College, HuNan, China
Venue:
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Year:
2010

Citing 2
Cited 0

C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Operating directly on compressed data can decrease CPU costs. Many light-weight compressions, such as run-length encoding and bit-vector encoding, can gain this benefit easily. Heavy-Weight Lempel-Ziv (LZ) has no method to operate directly on compressed data. We proposed a join algorithm, LZ join, which join two relations R and S directly on compressed data when decoding. Regard R as probe table and S as build table, R is encoded by LZ. When R probing S, LZ join decreases the join cost by using cached results (previous join results of IDs in R's LZ dictionary window when decoder find that the same R's ID sequence in window). LZ join combines decoding and join phase into one, which reduces the memory usage for decoding the whole R and CPU overhead for probing those cached results. Our analysis and experiments show that LZ join is better in some cases, the more compression ratio the better.