Automatic generation of benchmarks for plagiarism detection tools using grammatical evolution

  • Authors:
  • Manuel Cebrián;Manuel Alfonseca;Alfonso Ortega

  • Affiliations:
  • Universidad Autónoma de Madrid, Madrid, Spain;Universidad Autónoma de Madrid, Madrid, Spain;Universidad Autónoma de Madrid, Madrid, Spain

  • Venue:
  • Proceedings of the 9th annual conference on Genetic and evolutionary computation
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Student plagiarism is a mayor problem in universities worldwide. In this paper,we focus on plagiarism in answers to computer programming assignments,where student mix and/or modify one or more original solutions to obtain counterfeits. Although several software tools have been implemented to help the tedious and time consuming task of detecting plagiarism, little has been done to assess their quality, because, in fact, determining the original subset of the whole solutionset is practically impossible for graders. In this article we present a Grammatical Evolution technique which generates benchmarks. Given a programming language, our technique generates a set of original solutions to an assignment, together with a set of plagiarisms of the former set which mimic the way in which students act. The phylogeny of the coded solutions is predefined, providing a base for the evaluationof the performance of copy-catching tools. We give empirical evidence of the suitability of our approach by studying the behavior of one state-of-the-art detection tool (AC) on four benchmarks coded in APL2, generated with this technique.