Fast set operations using treaps
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Journal of the ACM (JACM)
Adaptive set intersections, unions, and differences
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Summary cache: a scalable wide-area web cache sharing protocol
IEEE/ACM Transactions on Networking (TON)
Space/time trade-offs in hash coding with allowable errors
Communications of the ACM
Hacker's Delight
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware
Journal of Computer and System Sciences - Special issu on PODS 2001
IO-Top-k: index-access optimized top-k query processing
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Multidimensional content eXploration
Proceedings of the VLDB Endowment
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Distinct-value synopses for multiset operations
Communications of the ACM - A View of Parallel Computing
On efficient posting list intersection with multicore processors
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Improving the performance of list intersection
Proceedings of the VLDB Endowment
Fast evaluation of union-intersection expressions
ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Cardinality estimation and dynamic length adaptation for Bloom filters
Distributed and Parallel Databases
Fast set intersection in memory
Proceedings of the VLDB Endowment
Best position algorithms for efficient top-k query processing
Information Systems
Faster adaptive set intersections for text searching
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Hi-index | 0.00 |
There is a long history of developing efficient algorithms for set intersection, which is a fundamental operation in information retrieval and databases. In this paper, we describe a new data structure, a Cardinality Filter, to quickly compute an upper bound on the size of a set intersection. Knowing an upper bound of the size can be used to accelerate many applications such as top-k query processing in text mining. Given finite sets A and B, the expected computation time for the upper bound of the size of the intersection |A cap B| is O( (|A| + |B|) w), where w is the machine word length. This is much faster than the current best algorithm for the exact intersection, which runs in O((|A| + |B|) / √w + |A cap B|) expected time. Our performance studies show that our implementations of Cardinality Filters are from 2 to 10 times faster than existing set intersection algorithms, and the time for a top-k query in a text mining application can be reduced by half.