The input/output complexity of sorting and related problems
Communications of the ACM
Improved query performance with variant indexes
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Bitmap index design and evaluation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
An efficient bitmap encoding scheme for selection queries
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Lower bounds for external memory dictionaries
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Exact and approximate membership testers
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Optimal External Memory Interval Management
SIAM Journal on Computing
B-tree indexes for high update rates
ACM SIGMOD Record
Approximate encoding for direct access and query processing over compressed bitmaps
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Lazy, adaptive rid-list intersection, and its application to index anding
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Multi-resolution bitmap indexes for scientific data
ACM Transactions on Database Systems (TODS)
On the performance of bitmap indices for high cardinality attributes
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Computational Geometry: Algorithms and Applications
Computational Geometry: Algorithms and Applications
Adaptive Bitmap Indexes for Space-Constrained Systems
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Fast evaluation of union-intersection expressions
ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Fast integer compression using SIMD instructions
Proceedings of the Sixth International Workshop on Data Management on New Hardware
Hi-index | 0.00 |
Let ∑ be a finite, ordered alphabet, and consider a string x=χ1χ2... χn ∈ ∑n. A secondary index for x answers alphabet range queries of the form: Given a range [αl,αr] ⊆ ∑, return the set I[αl,αr] = {i |χi ∈ [αl,αr]}. Secondary indexes are heavily used in relational databases and scientific data analysis. It is well-known that the obvious solution, storing a dictionary for the set ∪i{χi} with a position set associated with each character, does not always give optimal query time. In this paper we give the first theoretically optimal data structure for the secondary indexing problem. In the I/O model, the amount of data read when answering a query is within a constant factor of the minimum space needed to represent the set I[αl,αr], assuming that the size of internal memory is (|∑| lg n)δ blocks, for some constant δ 0. The space usage of the data structure is O(nlg |∑|) bits in the worst case, and we further show how to bound the size of the data structure in terms of the 0th order entropy of x. We show how to support updates achieving various time-space trade-offs. We also consider an approximate version of the basic secondary indexing problem where a query reports a superset of I[αl,αr] containing each element not in I[αl,αr] with probability at most ∈, where ∈ 0 is the false positive probability. For this problem the amount of data that needs to be read by the query algorithm is reduced to O(|I(αl,αr]| lg(1/∈)) bits.