Fast evaluation of union-intersection expressions

Authors:
Philip Bille;Anna Pagh;Rasmus Pagh
Affiliations:
IT University of Copenhagen, Denmark;IT University of Copenhagen, Denmark;IT University of Copenhagen, Denmark
Venue:
ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Year:
2007

Citing 14
Cited 4

The input/output complexity of sorting and related problems

Communications of the ACM
Sorting in linear time?

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Communication complexity

Communication complexity
Improved parallel integer sorting without concurrent writing

Information and Computation
Even strongly universal hashing is pretty fast

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Adaptive set intersections, unions, and differences

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Membership in Constant Time and Almost-Minimum Space

SIAM Journal on Computing
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Adaptive intersection and t-threshold problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Exact and approximate membership testers

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Some complexity questions related to distributive computing(Preliminary Report)

STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
An information statistics approach to data stream and communication complexity

Journal of Computer and System Sciences - Special issue on FOCS 2002
An optimal Bloom filter replacement

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Worst case optimal union-intersection expression evaluation

ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming

Secondary indexing in one dimension: beyond b-trees and bitmap indexes

Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast set intersection in memory

Proceedings of the VLDB Endowment
Faster upper bounding of intersection sizes

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Bitlist: new full-text index for low space cost and efficient keyword search

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size w, a special case of our result is that the intersection of m (preprocessed) sets, containing n elements in total, can be computed in expected time O(n(log w)2/w + km), where k is the number of elements in the intersection. If the first of the two terms dominates, this is a factor w1-o(1) faster than the standard solution of merging sorted lists. We show a cell probe lower bound of time Ω(n/(wm log m)+(1- log k/w)k), meaning that our upper bound is nearly optimal for small m. Our algorithm uses a novel combination of approximate set representations and word-level parallelism.