The input/output complexity of sorting and related problems
Communications of the ACM
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Communication complexity
Improved parallel integer sorting without concurrent writing
Information and Computation
Even strongly universal hashing is pretty fast
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Adaptive set intersections, unions, and differences
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Membership in Constant Time and Almost-Minimum Space
SIAM Journal on Computing
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Adaptive intersection and t-threshold problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Exact and approximate membership testers
STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Some complexity questions related to distributive computing(Preliminary Report)
STOC '79 Proceedings of the eleventh annual ACM symposium on Theory of computing
An information statistics approach to data stream and communication complexity
Journal of Computer and System Sciences - Special issue on FOCS 2002
An optimal Bloom filter replacement
SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Worst case optimal union-intersection expression evaluation
ICALP'05 Proceedings of the 32nd international conference on Automata, Languages and Programming
Secondary indexing in one dimension: beyond b-trees and bitmap indexes
Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast set intersection in memory
Proceedings of the VLDB Endowment
Faster upper bounding of intersection sizes
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Bitlist: new full-text index for low space cost and efficient keyword search
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size w, a special case of our result is that the intersection of m (preprocessed) sets, containing n elements in total, can be computed in expected time O(n(log w)2/w + km), where k is the number of elements in the intersection. If the first of the two terms dominates, this is a factor w1-o(1) faster than the standard solution of merging sorted lists. We show a cell probe lower bound of time Ω(n/(wm log m)+(1- log k/w)k), meaning that our upper bound is nearly optimal for small m. Our algorithm uses a novel combination of approximate set representations and word-level parallelism.