Maximal intersection queries in randomized graph models

Authors:
Benjamin Hoffmann;Yury Lifshits;Dirk Nowotka
Affiliations:
FMI, Universität Stuttgart, Germany;Steklov Institute of Mathematics at St. Petersburg, Russia;FMI, Universität Stuttgart, Germany
Venue:
CSR'07 Proceedings of the Second international conference on Computer Science: theory and applications
Year:
2007

Citing 7
Cited 2

Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Inverted files for text search engines

ACM Computing Surveys (CSUR)
A new method for approximate indexing and dictionary lookup with one error

Information Processing Letters
The hardness of decoding linear codes with preprocessing

IEEE Transactions on Information Theory

Disorder inequality: a combinatorial approach to nearest neighbor search

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Combinatorial Framework for Similarity Search

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Consider a family of sets and a single set, called query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Strict time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and distributing on-line advertisements. In general, maximal intersection queries are computationally expensive. Therefore, one needs to add some assumptions about the input in order to get an efficient solution. We investigate two wellmotivated distributions over all families of sets and propose an algorithm for each of them. We show that with very high probability an almost optimal solution is found in time logarithmic in the size of the family. In particular, we point out a threshold phenomenon on the probabilities of intersecting sets in each of our two input models which leads to the efficient algorithms mentioned above.