Incrementally maintaining run-length encoded attributes in column stores

Authors:
Abhijeet Mohapatra;Michael Genesereth
Affiliations:
Stanford University;Stanford University
Venue:
Proceedings of the 16th International Database Engineering & Applications Sysmposium
Year:
2012

Citing 18
Cited 0

Computing partial sums in multidimensional arrays

SCG '89 Proceedings of the fifth annual symposium on Computational geometry
A new data structure for cumulative frequency tables

Software—Practice & Experience
Range queries in OLAP data cubes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A decomposition storage model

SIGMOD '85 Proceedings of the 1985 ACM SIGMOD international conference on Management of data
A Lower Bound on the Complexity of Orthogonal Range Queries

Journal of the ACM (JACM)
The Complexity of Maintaining an Array and Computing Its Partial Sums

Journal of the ACM (JACM)
Introduction to algorithms

Introduction to algorithms
CRB-Tree: An Efficient Indexing Scheme for Range-Aggregate Queries

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Incremental computation and maintenance of temporal aggregates

The VLDB Journal — The International Journal on Very Large Data Bases
Spatiotemporal Aggregate Computation: A Survey

IEEE Transactions on Knowledge and Data Engineering
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Updating a cracked database

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
A dichromatic framework for balanced trees

SFCS '78 Proceedings of the 19th Annual Symposium on Foundations of Computer Science
Database Systems: The Complete Book

Database Systems: The Complete Book
Sorting improves word-aligned bitmap indexes

Data & Knowledge Engineering
Column-oriented database systems

Proceedings of the VLDB Endowment
Positional update handling in column stores

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Enterprise Application-Specific Data Management

EDOC '10 Proceedings of the 2010 14th IEEE International Enterprise Distributed Object Computing Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

Run-length encoding is a popular compression scheme which is used extensively to compress the attribute values in column stores. Out of order insertion of tuples potentially degrades the compression achieved using run-length encoding and consequently, the performance of reads. The in-place insertions, deletions and updates of tuples into a column store relation with n tuples take O(n) time. The linear cost is typically avoided by amortizing the cost of updates in batches. However, the relation is decompressed and subsequently re-compressed after applying a batch of updates. This leads to added time time complexity. We propose a novel indexing scheme called count indexes that supports O(log n) in-place insertions, deletions, updates and look ups on a run-length encoded sequence with n runs. We also show that count indexes efficiently update a batch of tuples requiring almost a constant time per updated tuple. Additionally, we show that count indexes are optimal. We extend count indexes to support O(log n) updates on bitmapped sequences with n values and adapt them to block-based stores.