Plagiarism detection using stopword n-grams

Authors:
Efstathios Stamatatos
Affiliations:
Department of Information and Communication Systems Engineering, University of the Aegean, 83200, Karlovassi, Greece
Venue:
Journal of the American Society for Information Science and Technology
Year:
2011

Citing 0
Cited 2

UKP: computing semantic textual similarity by combining multiple content similarity measures

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
The use of orthogonal similarity relations in the prediction of authorship

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper a novel method for detecting plagiarized passages in document collections is presented. In contrast to previous work in this field that uses content terms to represent documents, the proposed method is based on a small list of stopwords (i.e., very frequent words). We show that stopword n-grams reveal important information for plagiarism detection since they are able to capture syntactic similarities between suspicious and original documents and they can be used to detect the exact plagiarized passage boundaries. Experimental results on a publicly available corpus demonstrate that the performance of the proposed approach is competitive when compared with the best reported results. More importantly, it achieves significantly better results when dealing with difficult plagiarism cases where the plagiarized passages are highly modified and most of the words or phrases have been replaced with synonyms. © 2011 Wiley Periodicals, Inc.