PDetect: A Clustering Approach for Detecting Plagiarism in Source Code Datasets

  • Authors:
  • Lefteris Moussiades;Athena Vakali

  • Affiliations:
  • Division of Computing Systems, Department of Industrial Informatics, Technological Educational Institute of Kavala, GR-65404 Kavala, Greece;Department of Informatics, Aristotle University, 54124 Thessaloniki, Greece

  • Venue:
  • The Computer Journal
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Efficient detection of plagiarism in programming assignments of students is of a great importance to the educational procedure. This paper presents a clustering oriented approach for facing the problem of source code plagiarism. The implemented software, called PDetect, accepts as input a set of program sources and extracts subsets (the clusters of plagiarism) such that each program within a particular subset has been derived from the same original. PDetect proposes the use of an appropriate measure for evaluating plagiarism detection performance and supports the idea of combining different plagiarism detection schemes. Furthermore, a cluster analysis is performed in order to provide information beneficial to the plagiarism detection process. PDetect is designed such that it may be easily adapted over any keyword-based programming language and it is quite beneficial when compared with earlier (state-of-the-art) plagiarism detection approaches.