Faster upper bounding of intersection sizes

Authors:
Daisuke Takuma;Hiroki Yanagisawa
Affiliations:
IBM Research - Tokyo, Tokyo, Japan;IBM Research - Tokyo, Tokyo, Japan
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 19
Cited 0

Fast set operations using treaps

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A Fast Merging Algorithm

Journal of the ACM (JACM)
Adaptive set intersections, unions, and differences

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Summary cache: a scalable wide-area web cache sharing protocol

IEEE/ACM Transactions on Networking (TON)
Space/time trade-offs in hash coding with allowable errors

Communications of the ACM
Hacker's Delight

Hacker's Delight
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Optimal aggregation algorithms for middleware

Journal of Computer and System Sciences - Special issu on PODS 2001
IO-Top-k: index-access optimized top-k query processing

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Multidimensional content eXploration

Proceedings of the VLDB Endowment
Greedy List Intersection

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Distinct-value synopses for multiset operations

Communications of the ACM - A View of Parallel Computing
On efficient posting list intersection with multicore processors

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Improving the performance of list intersection

Proceedings of the VLDB Endowment
Fast evaluation of union-intersection expressions

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Cardinality estimation and dynamic length adaptation for Bloom filters

Distributed and Parallel Databases
Fast set intersection in memory

Proceedings of the VLDB Endowment
Best position algorithms for efficient top-k query processing

Information Systems
Faster adaptive set intersections for text searching

WEA'06 Proceedings of the 5th international conference on Experimental Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is a long history of developing efficient algorithms for set intersection, which is a fundamental operation in information retrieval and databases. In this paper, we describe a new data structure, a Cardinality Filter, to quickly compute an upper bound on the size of a set intersection. Knowing an upper bound of the size can be used to accelerate many applications such as top-k query processing in text mining. Given finite sets A and B, the expected computation time for the upper bound of the size of the intersection |A cap B| is O( (|A| + |B|) w), where w is the machine word length. This is much faster than the current best algorithm for the exact intersection, which runs in O((|A| + |B|) / √w + |A cap B|) expected time. Our performance studies show that our implementations of Cardinality Filters are from 2 to 10 times faster than existing set intersection algorithms, and the time for a top-k query in a text mining application can be reduced by half.