A novel framework to detect source code plagiarism: now, students have to work for real!

  • Authors:
  • Boris Lesner;Romain Brixtel;Cyril Bazin;Guillaume Bagan

  • Affiliations:
  • GREYC, Caen Cedex - France;GREYC, Caen Cedex - France;GREYC, Caen Cedex - France;GREYC, Caen Cedex - France

  • Venue:
  • Proceedings of the 2010 ACM Symposium on Applied Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Our work focuses on detecting plagiarism within a source code corpus. The case study is to help a human corrector to find out plagiarism within source code written by Computer Science students. Like other approaches, we use the notion of similarity distance. However, in this work we introduce segmentation to split documents into smaller parts and propose a document-wise distance based on the cost of permuting segments to transform one document to another. Our framework is laid out as a pipeline, where each stage can be parameterized to build up a plagirism detector fitting user needs. The approach makes no assumption about the programming language being analyzed. Furthermore, it provides a synthetical report of the results to ease the decision making process, as we consider that only a human user has final word on wether it is plagiarism or not. We tested our framework on hundreds of real source files, involving many programming languages, allowing us to discover previously undetected frauds.