Column imprints: a secondary index structure

Authors:
Lefteris Sidirourgos;Martin Kersten
Affiliations:
CWI, Amsterdam, Netherlands;CWI, Amsterdam, Netherlands
Venue:
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Year:
2013

Citing 21
Cited 0

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Multi-table joins through bitmapped join indices

ACM SIGMOD Record
Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
An efficient bitmap encoding scheme for selection queries

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Space efficient bitmap indexing

Proceedings of the ninth international conference on Information and knowledge management
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Model 204 Architecture and Performance

Proceedings of the 2nd International Workshop on High Performance Transaction Systems
Generalized Search Trees for Database Systems

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Compressing Bitmap Indexes for Faster Search Operations

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
C-store: a column-oriented DBMS

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Optimizing bitmap indices with efficient compression

ACM Transactions on Database Systems (TODS)
Update Conscious Bitmap Indices

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
Multi-resolution bitmap indexes for scientific data

ACM Transactions on Database Systems (TODS)
Bit transposed files

VLDB '85 Proceedings of the 11th international conference on Very Large Data Bases - Volume 11
On the performance of bitmap indices for high cardinality attributes

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
New binning strategy for bitmap indices on high cardinality attributes

Proceedings of the 2nd Bangalore Annual Compute Conference
Data Parallel Bin-Based Indexing for Answering Queries on Multi-core Architectures

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Analyses of multi-level and multi-component compressed bitmap indexes

ACM Transactions on Database Systems (TODS)
Database architecture evolution: mammals flourished long before dinosaurs became extinct

Proceedings of the VLDB Endowment
Positional update handling in column stores

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Multi-level bitmap indexes for flash memory storage

Proceedings of the Fourteenth International Database Engineering & Applications Symposium

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large scale data warehouses rely heavily on secondary indexes, such as bitmaps and b-trees, to limit access to slow IO devices. However, with the advent of large main memory systems, cache conscious secondary indexes are needed to improve also the transfer bandwidth between memory and cpu. In this paper, we introduce column imprint, a simple but efficient cache conscious secondary index. A column imprint is a collection of many small bit vectors, each indexing the data points of a single cacheline. An imprint is used during query evaluation to limit data access and thus minimize memory traffic. The compression for imprints is cpu friendly and exploits the empirical observation that data often exhibits local clustering or partial ordering as a side-effect of the construction process. Most importantly, column imprint compression remains effective and robust even in the case of unclustered data, while other state-of-the-art solutions fail. We conducted an extensive experimental evaluation to assess the applicability and the performance impact of the column imprints. The storage overhead, when experimenting with real world datasets, is just a few percent over the size of the columns being indexed. The evaluation time for over 40000 range queries of varying selectivity revealed the efficiency of the proposed index compared to zonemaps and bitmaps with WAH compression.