Average case analysis of greedy algorithms for optimisation problems on set systems
Theoretical Computer Science
A tight analysis of the greedy algorithm for set cover
STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
A threshold of ln n for approximating set cover
Journal of the ACM (JACM)
A technique for measuring the relative size and overlap of public Web search engines
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Approximation algorithms for combinatorial problems
STOC '73 Proceedings of the fifth annual ACM symposium on Theory of computing
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
Using syntactic information to extract relevant terms for multi-document summarization
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Fast generation of result snippets in web search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
An effective snippet generation method using the pseudo relevance feedback technique
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A scalable pattern mining approach to web graph compression with communities
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Proceedings of the Second ACM International Conference on Web Search and Data Mining
Concentration of Measure for the Analysis of Randomized Algorithms
Concentration of Measure for the Analysis of Randomized Algorithms
Automatic retrieval of similar content using search engine query interface
Proceedings of the 18th ACM conference on Information and knowledge management
A statistical approach to mechanized encoding and searching of literary information
IBM Journal of Research and Development
Absolute o( logm) error in approximating random set covering: an average case analysis
Information Processing Letters
Proceedings of the 19th international conference on World wide web
Retrieving similar documents from the web
Journal of Web Engineering
Hi-index | 0.00 |
A strong query for a target document with respect to an index is the smallest query for which the target document is returned by the index as the top result for the query. The strong query problem was first studied more than a decade ago in the context of measuring search engine overlap. Despite its simple-to-state nature and its longevity in the field, this problem has not been sufficiently addressed in a formal manner. In this paper we provide the first rigorous treatment of the strong query problem. We show an interesting connection between this problem and the set cover problem, and use it to obtain basic hardness and algorithmic results. Experiments on more than 10K documents show that our proposed algorithm performs much better than the widely-used word frequency-based heuristic. En route, our study suggests that less than four words on average can be sufficient to uniquely identify web pages.