Strategies for retrieving plagiarized documents

  • Authors:
  • Benno Stein;Sven Meyer zu Eissen;Martin Potthast

  • Affiliations:
  • Bauhaus University Weimar, Weimar, Germany;Bauhaus University Weimar, Weimar, Germany;Bauhaus University Weimar, Weimar, Germany

  • Venue:
  • SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

For the identification of plagiarized passages in large document collections we present retrieval strategies which rely on stochastic sampling and chunk indexes. Using the entire Wikipedia corpus we compile n-gram indexes and compare them to a new kind of fingerprint index in a plagiarism analysis use case. Our index provides an analysis speed-up by factor 1.5 and is an order of magnitude smaller, while being equivalent in terms of precision and recall.