Plag-Inn: intrinsic plagiarism detection using grammar trees

  • Authors:
  • Michael Tschuggnall;Günther Specht

  • Affiliations:
  • Databases and Information Systems, Institute of Computer Science, University of Innsbruck, Austria;Databases and Information Systems, Institute of Computer Science, University of Innsbruck, Austria

  • Venue:
  • NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Intrinsic plagiarism detection deals with the task of finding plagiarized sections of text documents without using a reference corpus. This paper describes a novel approach to this task by processing and analyzing the grammar of a suspicious document. The main idea is to split a text into single sentences and to calculate grammar trees. To find suspicious sentences, these grammar trees are compared in a distance matrix by using the pq-gram-distance, an alternative for the tree edit distance. Finally, significantly different sentences regarding their grammar and with respect to the Gaussian normal distribution are marked as suspicious.