Instructor-centric source code plagiarism detection and plagiarism corpus

  • Authors:
  • Jonathan Y.H. Poon;Kazunari Sugiyama;Yee Fan Tan;Min-Yen Kan

  • Affiliations:
  • National University of Singapore, Singapore, Singapore;National University of Singapore, Singapore, Singapore;KAI Square, Singapore, Singapore;National University of Singapore, Singapore, Singapore

  • Venue:
  • Proceedings of the 17th ACM annual conference on Innovation and technology in computer science education
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing source code plagiarism systems focus on the problem of identifying plagiarism between pairs of submissions. The task of detection, while essential, is only a small part of managing plagiarism in an instructional setting. Holistic plagiarism detection and management requires coordination and sharing of assignment similarity -- elevating plagiarism detection from pairwise similarity to cluster-based similarity; from a single assignment to a sequence of assignments in the same course, and even among instructors of different courses. To address these shortcomings, we have developed Student Submissions Integrity Diagnosis (SSID), an open-source system that provides holistic plagiarism detection in an instructor-centric way. SSID's visuals show overviews of plagiarism clusters throughout all assignments in a course as well as highlighting most-similar submissions on any specific student. SSID supports plagiarism detection workflows; e.g., allowing student assistants to flag suspicious assignments for later review and confirmation by an instructor with proper authority. Evidence is automatically entered into SSID's logs and shared among instructors. We have additionally collected a source code plagiarism corpus, which we employ to identify and correct shortcomings of previous plagiarism detection engines and to optimize parameter tuning for SSID deployment. Since its deployment, SSID's workflow enhancements have made plagiarism detection in our faculty less tedious and more successful.