Methods for identifying versioned and plagiarized documents
Journal of the American Society for Information Science and Technology
On the Resemblance and Containment of Documents
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Text genre detection using common word frequencies
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Similarity measures for tracking information flow
Proceedings of the 14th ACM international conference on Information and knowledge management
Journal of the American Society for Information Science and Technology
Generating links by mining quotations
Proceedings of the nineteenth ACM conference on Hypertext and hypermedia
SpotSigs: robust and efficient near duplicate detection in large web collections
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Nowhere to Hide: Finding Plagiarized Documents Based on Sentence Similarity
WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Proceedings of the Second ACM International Conference on Web Search and Data Mining
On Automatic Plagiarism Detection Based on n-Grams Comparison
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Using syntactic information to identify plagiarism
EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP
Stopword Graphs and Authorship Attribution in Text Corpora
ICSC '09 Proceedings of the 2009 IEEE International Conference on Semantic Computing
Efficient partial-duplicate detection based on sequence matching
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
An evaluation framework for plagiarism detection
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Cross-language plagiarism detection
Language Resources and Evaluation
Online plagiarism detection through exploiting lexical, syntactic, and semantic information
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Hi-index | 0.00 |
In this paper a novel method for detecting plagiarized passages in document collections is presented. In contrast to previous work in this field that uses mainly content terms to represent documents, the proposed method is based on structural information provided by occurrences of a small list of stopwords (i.e., very frequent words). We show that stopword n-grams are able to capture local syntactic similarities between suspicious and original documents. Moreover, an algorithm for detecting the exact boundaries of plagiarized and source passages is proposed. Experimental results on a publicly-available corpus demonstrate that the performance of the proposed approach is competitive when compared with the best reported results. More importantly, it achieves significantly better results when dealing with difficult plagiarism cases where the plagiarized passages are highly modified by replacing most of the words or phrases with synonyms to hide the similarity with the source documents.