Secondary indexing in one dimension: beyond b-trees and bitmap indexes

Authors:
Rasmus Pagh;Srinivasa Rao Satti
Affiliations:
IT University of Copenhagen, Copenhagen, Denmark;Seoul National University, Seoul, South Korea
Venue:
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Year:
2009

Citing 16
Cited 1

The input/output complexity of sorting and related problems

Communications of the ACM
Improved query performance with variant indexes

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bitmap index design and evaluation

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
An efficient bitmap encoding scheme for selection queries

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Lower bounds for external memory dictionaries

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Exact and approximate membership testers

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Optimal External Memory Interval Management

SIAM Journal on Computing
B-tree indexes for high update rates

ACM SIGMOD Record
Approximate encoding for direct access and query processing over compressed bitmaps

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Lazy, adaptive rid-list intersection, and its application to index anding

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Multi-resolution bitmap indexes for scientific data

ACM Transactions on Database Systems (TODS)
On the performance of bitmap indices for high cardinality attributes

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Computational Geometry: Algorithms and Applications

Computational Geometry: Algorithms and Applications
Adaptive Bitmap Indexes for Space-Constrained Systems

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Fast evaluation of union-intersection expressions

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation

Fast integer compression using SIMD instructions

Proceedings of the Sixth International Workshop on Data Management on New Hardware

Quantified Score

Hi-index	0.00

Visualization

Abstract

Let ∑ be a finite, ordered alphabet, and consider a string x=χ1χ2... χn ∈ ∑n. A secondary index for x answers alphabet range queries of the form: Given a range [αl,αr] ⊆ ∑, return the set I[αl,αr] = {i |χi ∈ [αl,αr]}. Secondary indexes are heavily used in relational databases and scientific data analysis. It is well-known that the obvious solution, storing a dictionary for the set ∪i{χi} with a position set associated with each character, does not always give optimal query time. In this paper we give the first theoretically optimal data structure for the secondary indexing problem. In the I/O model, the amount of data read when answering a query is within a constant factor of the minimum space needed to represent the set I[αl,αr], assuming that the size of internal memory is (|∑| lg n)δ blocks, for some constant δ 0. The space usage of the data structure is O(nlg |∑|) bits in the worst case, and we further show how to bound the size of the data structure in terms of the 0th order entropy of x. We show how to support updates achieving various time-space trade-offs. We also consider an approximate version of the basic secondary indexing problem where a query reports a superset of I[αl,αr] containing each element not in I[αl,αr] with probability at most ∈, where ∈ 0 is the false positive probability. For this problem the amount of data that needs to be read by the query algorithm is reduced to O(|I(αl,αr]| lg(1/∈)) bits.