Detection of simple plagiarism in computer science papers

  • Authors:
  • Yaakov HaCohen-Kerner;Aharon Tayeb;Natan Ben-Dror

  • Affiliations:
  • Jerusalem College of Technology (Machon Lev);Jerusalem College of Technology (Machon Lev);Jerusalem College of Technology (Machon Lev)

  • Venue:
  • COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Plagiarism is the use of the language and thoughts of another work and the representation of them as one's own original work. Various levels of plagiarism exist in many domains in general and in academic papers in particular. Therefore, diverse efforts are taken to automatically identify plagiarism. In this research, we developed software capable of simple plagiarism detection. We have built a corpus (C) containing 10,100 academic papers in computer science written in English and two test sets including papers that were randomly chosen from C. A widespread variety of baseline methods has been developed to identify identical or similar papers. Several methods are novel. The experimental results and their analysis show interesting findings. Some of the novel methods are among the best predictive methods.