An algorithmic treatment of strong queries

Authors:
Ravi Kumar;Silvio Lattanzi;Prabhakar Raghavan
Affiliations:
Yahoo! Research, Sunnyvale, CA, USA;La Sapienza Univ. of Rome, Rome, Italy;Yahoo! Research, Sunnyvale, CA, USA
Venue:
Proceedings of the fourth ACM international conference on Web search and data mining
Year:
2011

Citing 18
Cited 0

Average case analysis of greedy algorithms for optimisation problems on set systems

Theoretical Computer Science
A tight analysis of the greedy algorithm for set cover

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
A sub-constant error-probability low-degree test, and a sub-constant error-probability PCP characterization of NP

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
A threshold of ln n for approximating set cover

Journal of the ACM (JACM)
A technique for measuring the relative size and overlap of public Web search engines

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Approximation algorithms for combinatorial problems

STOC '73 Proceedings of the fifth annual ACM symposium on Theory of computing
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Using syntactic information to extract relevant terms for multi-document summarization

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Fast generation of result snippets in web search

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An effective snippet generation method using the pseudo relevance feedback technique

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A scalable pattern mining approach to web graph compression with communities

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Query by document

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Concentration of Measure for the Analysis of Randomized Algorithms

Concentration of Measure for the Analysis of Randomized Algorithms
Automatic retrieval of similar content using search engine query interface

Proceedings of the 18th ACM conference on Information and knowledge management
A statistical approach to mechanized encoding and searching of literary information

IBM Journal of Research and Development
Absolute o( logm) error in approximating random set covering: an average case analysis

Information Processing Letters
Max-cover in map-reduce

Proceedings of the 19th international conference on World wide web
Retrieving similar documents from the web

Journal of Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A strong query for a target document with respect to an index is the smallest query for which the target document is returned by the index as the top result for the query. The strong query problem was first studied more than a decade ago in the context of measuring search engine overlap. Despite its simple-to-state nature and its longevity in the field, this problem has not been sufficiently addressed in a formal manner. In this paper we provide the first rigorous treatment of the strong query problem. We show an interesting connection between this problem and the set cover problem, and use it to obtain basic hardness and algorithmic results. Experiments on more than 10K documents show that our proposed algorithm performs much better than the widely-used word frequency-based heuristic. En route, our study suggests that less than four words on average can be sufficient to uniquely identify web pages.