Variable length compression for bitmap indices

Authors:
Fabian Corrales;David Chiu;Jason Sawin
Affiliations:
Department of Mathematics and Computer Science, University of Puget Sound;School of Engineering and Computer Science, Washington State University;Department of Mathematics and Computer Science, University of Puget Sound
Venue:
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part II
Year:
2011

Citing 17
Cited 1

Parameterised compression for sparse bitmaps

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
An efficient bitmap encoding scheme for selection queries

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Model 204 Architecture and Performance

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
Compressing Bitmap Indexes for Faster Search Operations

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Preface: special issue on data management in bioinformatics

Information Systems - Special issue: Data management in bioinformatics
Byte-aligned bitmap compression

DCC '95 Proceedings of the Conference on Data Compression
Compressing Bitmap Indices by Data Reorganization

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Integrating compression and execution in column-oriented database systems

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Multi-resolution bitmap indexes for scientific data

ACM Transactions on Database Systems (TODS)
Bit transposed files

VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
Analysis of Basic Data Reordering Techniques

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Histogram-aware sorting for enhanced word-aligned compression in bitmap indexes

Proceedings of the ACM 11th international workshop on Data warehousing and OLAP
Directly Addressable Variable-Length Codes

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Sorting improves word-aligned bitmap indexes

Data & Knowledge Engineering
Position list word aligned hybrid: optimizing space and performance for compressed bitmaps

Proceedings of the 13th International Conference on Extending Database Technology
Reordering columns for smaller indexes

Information Sciences: an International Journal

Dynamic bitmap index recompression through workload-based optimizations

Proceedings of the 17th International Database Engineering & Applications Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern large-scale applications are generating staggering amounts of data. In an effort to summarize and index these data sets, databases often use bitmap indices. These indices have become widely adopted due to their dual properties of (1) being able to leverage fast bit-wise operations for query processing and (2) compressibility. Today, two pervasive bitmap compression schemes employ a variation of run-length encoding, aligned over bytes (BBC) and words (WAH), respectively. While BBC typically offers high compression ratios, WAH can achieve faster query processing, but often at the cost of space. Recent work has further shown that reordering the rows of a bitmap can dramatically increase compression. However, these sorted bitmaps often display patterns of changing run-lengths that are not optimal for a byte nor a word alignment. We present a general framework to facilitate a variable length compression scheme. Given a bitmap, our algorithm is able to use different encoding lengths for compression on a per-column basis. We further present an algorithm that efficiently processes queries when encoding lengths share a common integer factor. Our empirical study shows that in the best case our approach can out-compress BBC by 30% and WAH by 70%, for real data sets. Furthermore, we report a query processing speedup of 1.6× over BBC and 1.25× over WAH. We will also show that these numbers drastically improve in our synthetic, uncorrelated data sets.