An evaluation framework for plagiarism detection

Authors:
Martin Potthast;Benno Stein;Alberto Barrón-Cedeño;Paolo Rosso
Affiliations:
Bauhaus-Universität Weimar;Bauhaus-Universität Weimar;Universidad Politécnica de Valencia;Universidad Politécnica de Valencia
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 10
Cited 19

Identification of program similarity in large populations

The Computer Journal - Special issue on procedural programming
Identifying and Filtering Near-Duplicate Documents

COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
AI Gets a Brain

Queue - AI
Towards a mutation-based automatic framework for evaluating code clone detection tools

Proceedings of the 2008 C3S2E conference
Scenario-Based Comparison of Clone Detection Techniques

ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Science of Computer Programming
Financial incentives and the "performance of crowds"

Proceedings of the ACM SIGKDD Workshop on Human Computation
Towards the validation of plagiarism detection tools by means of grammar evolution

IEEE Transactions on Evolutionary Computation

Plagiarism and authorship analysis: introduction to the special issue

Language Resources and Evaluation
Query segmentation revisited

Proceedings of the 20th international conference on World wide web
Fourth international workshop on uncovering plagiarism, authorship, and social software misuse

ACM SIGIR Forum
Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Obfuscating plagiarism detection: vulnerabilities and solutions

Proceedings of the 12th International Conference on Computer Systems and Technologies
Plagiarism detection based on structural information

Proceedings of the 20th ACM international conference on Information and knowledge management
A weighted profile intersection measure for profile-based authorship attribution

MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Retrieving candidate plagiarised documents using query expansion

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Using structural information and citation evidence to detect significant plagiarism cases in scientific publications

Journal of the American Society for Information Science and Technology
Plag-Inn: intrinsic plagiarism detection using grammar trees

NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Analysis and extraction of sentence-level paraphrase sub-corpus in CS education

Proceedings of the 13th annual conference on Information technology education
A feasibility study on using clustering algorithms in programming education research

Proceedings of the 13th annual conference on Information technology education
Re-examining machine translation metrics for paraphrase identification

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Online plagiarism detection through exploiting lexical, syntactic, and semantic information

ACL '12 Proceedings of the ACL 2012 System Demonstrations
Determining and characterizing the reused text for plagiarism detection

Expert Systems with Applications: An International Journal
Crowdsourcing interactions: using crowdsourcing for evaluating interactive information retrieval systems

Information Retrieval
Paraphrase acquisition via crowdsourcing and machine learning

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Plagiarism Detection for Indonesian Texts

Proceedings of International Conference on Information Integration and Web-based Applications & Services
Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present an evaluation framework for plagiarism detection. The framework provides performance measures that address the specifics of plagiarism detection, and the PAN-PC-10 corpus, which contains 64 558 artificial and 4 000 simulated plagiarism cases, the latter generated via Amazon's Mechanical Turk. We discuss the construction principles behind the measures and the corpus, and we compare the quality of our corpus to existing corpora. Our analysis gives empirical evidence that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.