Incrementally maintaining run-length encoded attributes in column stores

  • Authors:
  • Abhijeet Mohapatra;Michael Genesereth

  • Affiliations:
  • Stanford University;Stanford University

  • Venue:
  • Proceedings of the 16th International Database Engineering & Applications Sysmposium
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Run-length encoding is a popular compression scheme which is used extensively to compress the attribute values in column stores. Out of order insertion of tuples potentially degrades the compression achieved using run-length encoding and consequently, the performance of reads. The in-place insertions, deletions and updates of tuples into a column store relation with n tuples take O(n) time. The linear cost is typically avoided by amortizing the cost of updates in batches. However, the relation is decompressed and subsequently re-compressed after applying a batch of updates. This leads to added time time complexity. We propose a novel indexing scheme called count indexes that supports O(log n) in-place insertions, deletions, updates and look ups on a run-length encoded sequence with n runs. We also show that count indexes efficiently update a batch of tuples requiring almost a constant time per updated tuple. Additionally, we show that count indexes are optimal. We extend count indexes to support O(log n) updates on bitmapped sequences with n values and adapt them to block-based stores.