Sentence-based natural language plagiarism detection

  • Authors:
  • Daniel R. White;Mike S. Joy

  • Affiliations:
  • University of Warwick, Coventry, United Kingdom;University of Warwick, Coventry, United Kingdom

  • Venue:
  • Journal on Educational Resources in Computing (JERIC)
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

With the increasing levels of access to higher education in the United Kingdom, larger class sizes make it unrealistic for tutors to be expected to identify instances of peer-to-peer plagiarism by eye and so automated solutions to the problem are required. This document details a novel algorithm for comparison of suspect documents at a sentence level and has been implemented as a component of plagiarism detection software for detecting similarities in both natural language documents and comments within program source-code. The algorithm is capable of detecting sophisticated obfuscation (such as paraphrasing, reordering, merging, and splitting sentences) as well as direct copying. The implemented algorithm has also been used to successfully detect plagiarism on real assignments at the university. The software has been evaluated by comparison with other plagiarism detection tools.