Capacity-constrained query formulation

  • Authors:
  • Matthias Hagen;Benno Maria Stein

  • Affiliations:
  • Faculty of Media, Bauhaus University Weimar, Germany;Faculty of Media, Bauhaus University Weimar, Germany

  • Venue:
  • ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a set of keyphrases, we analyze how Web queries with these phrases can be formed that, taken altogether, return a specified number of hits. The use case of this problem is a plagiarism detection system that searches the Web for potentially plagiarized passages in a given suspicious document. For the query formulation problem we develop a heuristic search strategy based on co-occurrence probabilities. Compared to the maximal termset strategy [3], which can be considered as the most sensible non-heuristic baseline, our expected savings are on average 50% when queries for 9 or 10 phrases are to be constructed.