Communications of the ACM - Hacking and innovation
Plagiarism Detection Based on Singular Value Decomposition
GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance
CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing
Using syntactic information to identify plagiarism
EdAppsNLP 05 Proceedings of the second workshop on Building Educational Applications Using NLP
Syntactic constraints on paraphrases extracted from parallel corpora
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
An evaluation framework for plagiarism detection
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Developing a corpus of plagiarised short answers
Language Resources and Evaluation
Terrier information retrieval platform
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Folktale classification using learning to rank
ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
Hi-index | 0.00 |
External plagiarism detection systems compare suspicious texts against a reference collection to identify the original one(s). The suspicious text may not contain a verbatim copy of the reference collection since plagiarists often try to disguise their behaviour by altering the text. For large reference collections, such as those accessible via the internet, it is not practical to compare the suspicious text with every document in the reference collection. Consequently many approaches to plagiarism detection begin by identifying a set of candidate documents from the reference collection. We report an IR-based approach to the candidate document selection problem that uses query expansion to identify candidates which have been altered. The reported system outperforms a previously reported approach and is also robust to changes in the reference collection text.