Plagiarism detection based on structural information

Authors:
Efstathios Stamatatos
Affiliations:
University of the Aegean, Karlovassi, Greece
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 18
Cited 1

Methods for identifying versioned and plagiarized documents

Journal of the American Society for Information Science and Technology
On the Resemblance and Containment of Documents

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Winnowing: local algorithms for document fingerprinting

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Text genre detection using common word frequencies

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Similarity measures for tracking information flow

Proceedings of the 14th ACM international conference on Information and knowledge management
Feature instability as a criterion for selecting potential style markers: Special Topic Section on Computational Analysis of Style

Journal of the American Society for Information Science and Technology
Generating links by mining quotations

Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
SpotSigs: robust and efficient near duplicate detection in large web collections

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Local text reuse detection

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval

Introduction to Information Retrieval
Nowhere to Hide: Finding Plagiarized Documents Based on Sentence Similarity

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Finding text reuse on the web

Proceedings of the Second ACM International Conference on Web Search and Data Mining
On Automatic Plagiarism Detection Based on n-Grams Comparison

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Using syntactic information to identify plagiarism

EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP
Stopword Graphs and Authorship Attribution in Text Corpora

ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
Efficient partial-duplicate detection based on sequence matching

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
An evaluation framework for plagiarism detection

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Cross-language plagiarism detection

Language Resources and Evaluation

Online plagiarism detection through exploiting lexical, syntactic, and semantic information

ACL '12 Proceedings of the ACL 2012 System Demonstrations

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper a novel method for detecting plagiarized passages in document collections is presented. In contrast to previous work in this field that uses mainly content terms to represent documents, the proposed method is based on structural information provided by occurrences of a small list of stopwords (i.e., very frequent words). We show that stopword n-grams are able to capture local syntactic similarities between suspicious and original documents. Moreover, an algorithm for detecting the exact boundaries of plagiarized and source passages is proposed. Experimental results on a publicly-available corpus demonstrate that the performance of the proposed approach is competitive when compared with the best reported results. More importantly, it achieves significantly better results when dealing with difficult plagiarism cases where the plagiarized passages are highly modified by replacing most of the words or phrases with synonyms to hide the similarity with the source documents.