Developing a corpus of plagiarised short answers

  • Authors:
  • Paul Clough;Mark Stevenson

  • Affiliations:
  • Department of Information Studies, University of Sheffield, Sheffield, UK S1 4DP;Department of Computer Science, University of Sheffield, Sheffield, UK S1 4DP

  • Venue:
  • Language Resources and Evaluation
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Plagiarism is widely acknowledged to be a significant and increasing problem for higher education institutions (McCabe 2005; Judge 2008). A wide range of solutions, including several commercial systems, have been proposed to assist the educator in the task of identifying plagiarised work, or even to detect them automatically. Direct comparison of these systems is made difficult by the problems in obtaining genuine examples of plagiarised student work. We describe our initial experiences with constructing a corpus consisting of answers to short questions in which plagiarism has been simulated. This corpus is designed to represent types of plagiarism that are not included in existing corpora and will be a useful addition to the set of resources available for the evaluation of plagiarism detection systems.