Join directly on heavy-weight compressed data in column-oriented database

  • Authors:
  • Gan Liang;Li RunHeng;Jia Yan;Jin Xin

  • Affiliations:
  • School of Computer Science, Nation University of Defense Technology, HuNan, China;School of Computer Science, Nation University of Defense Technology, HuNan, China;School of Computer Science, Nation University of Defense Technology, HuNan, China;School of Software, ChangSha Social Work College, HuNan, China

  • Venue:
  • WAIM'10 Proceedings of the 11th international conference on Web-age information management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Operating directly on compressed data can decrease CPU costs. Many light-weight compressions, such as run-length encoding and bit-vector encoding, can gain this benefit easily. Heavy-Weight Lempel-Ziv (LZ) has no method to operate directly on compressed data. We proposed a join algorithm, LZ join, which join two relations R and S directly on compressed data when decoding. Regard R as probe table and S as build table, R is encoded by LZ. When R probing S, LZ join decreases the join cost by using cached results (previous join results of IDs in R's LZ dictionary window when decoder find that the same R's ID sequence in window). LZ join combines decoding and join phase into one, which reduces the memory usage for decoding the whole R and CPU overhead for probing those cached results. Our analysis and experiments show that LZ join is better in some cases, the more compression ratio the better.