Retrieving candidate plagiarised documents using query expansion

  • Authors:
  • Rao Muhammad Adeel Nawab;Mark Stevenson;Paul Clough

  • Affiliations:
  • Department of Computer Science, University of Sheffield, UK;Department of Computer Science, University of Sheffield, UK;Information School, University of Sheffield, UK

  • Venue:
  • ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

External plagiarism detection systems compare suspicious texts against a reference collection to identify the original one(s). The suspicious text may not contain a verbatim copy of the reference collection since plagiarists often try to disguise their behaviour by altering the text. For large reference collections, such as those accessible via the internet, it is not practical to compare the suspicious text with every document in the reference collection. Consequently many approaches to plagiarism detection begin by identifying a set of candidate documents from the reference collection. We report an IR-based approach to the candidate document selection problem that uses query expansion to identify candidates which have been altered. The reported system outperforms a previously reported approach and is also robust to changes in the reference collection text.