Capacity-constrained query formulation

Authors:
Matthias Hagen;Benno Maria Stein
Affiliations:
Faculty of Media, Bauhaus University Weimar, Germany;Faculty of Media, Bauhaus University Weimar, Germany
Venue:
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Year:
2010

Citing 4
Cited 3

Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Using Noun Phrase Heads to Extract Document Keyphrases

AI '00 Proceedings of the 13th Biennial Conference of the Canadian Society on Computational Studies of Intelligence: Advances in Artificial Intelligence
Constructing Web search queries from the user's information need expressed in a natural language

Proceedings of the 2003 ACM symposium on Applied computing
Maximal termsets as a query structuring mechanism

Proceedings of the 14th ACM international conference on Information and knowledge management

Introducing the user-over-ranking hypothesis

ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Candidate document retrieval for web-scale text reuse detection

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
From keywords to keyqueries: content descriptors for the web

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Given a set of keyphrases, we analyze how Web queries with these phrases can be formed that, taken altogether, return a specified number of hits. The use case of this problem is a plagiarism detection system that searches the Web for potentially plagiarized passages in a given suspicious document. For the query formulation problem we develop a heuristic search strategy based on co-occurrence probabilities. Compared to the maximal termset strategy [3], which can be considered as the most sensible non-heuristic baseline, our expected savings are on average 50% when queries for 9 or 10 phrases are to be constructed.