Plag-Inn: intrinsic plagiarism detection using grammar trees

Authors:
Michael Tschuggnall;Günther Specht
Affiliations:
Databases and Information Systems, Institute of Computer Science, University of Innsbruck, Austria;Databases and Information Systems, Institute of Computer Science, University of Innsbruck, Austria
Venue:
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Year:
2012

Citing 7
Cited 0

Automatic text categorization in terms of genre and author

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Experiments on sentence boundary detection

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
A survey on tree edit distance and related problems

Theoretical Computer Science
The pq-gram distance between ordered labeled trees

ACM Transactions on Database Systems (TODS)
An evaluation framework for plagiarism detection

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Intrinsic plagiarism detection deals with the task of finding plagiarized sections of text documents without using a reference corpus. This paper describes a novel approach to this task by processing and analyzing the grammar of a suspicious document. The main idea is to split a text into single sentences and to calculate grammar trees. To find suspicious sentences, these grammar trees are compared in a distance matrix by using the pq-gram-distance, an alternative for the tree edit distance. Finally, significantly different sentences regarding their grammar and with respect to the Gaussian normal distribution are marked as suspicious.