Retrieving candidate plagiarised documents using query expansion

Authors:
Rao Muhammad Adeel Nawab;Mark Stevenson;Paul Clough
Affiliations:
Department of Computer Science, University of Sheffield, UK;Department of Computer Science, University of Sheffield, UK;Information School, University of Sheffield, UK
Venue:
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Year:
2012

Citing 8
Cited 1

Plagiarism on the rise

Communications of the ACM - Hacking and innovation
Plagiarism Detection Based on Singular Value Decomposition

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Using syntactic information to identify plagiarism

EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP
Syntactic constraints on paraphrases extracted from parallel corpora

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An evaluation framework for plagiarism detection

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Developing a corpus of plagiarised short answers

Language Resources and Evaluation
Terrier information retrieval platform

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research

Folktale classification using learning to rank

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

External plagiarism detection systems compare suspicious texts against a reference collection to identify the original one(s). The suspicious text may not contain a verbatim copy of the reference collection since plagiarists often try to disguise their behaviour by altering the text. For large reference collections, such as those accessible via the internet, it is not practical to compare the suspicious text with every document in the reference collection. Consequently many approaches to plagiarism detection begin by identifying a set of candidate documents from the reference collection. We report an IR-based approach to the candidate document selection problem that uses query expansion to identify candidates which have been altered. The reported system outperforms a previously reported approach and is also robust to changes in the reference collection text.