Identification of program similarity in large populations
The Computer Journal - Special issue on procedural programming
Identifying and Filtering Near-Duplicate Documents
COM '00 Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
Hierarchical Clustering Algorithms for Document Datasets
Data Mining and Knowledge Discovery
Learning to paraphrase: an unsupervised approach using multiple-sequence alignment
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Queue - AI
Towards a mutation-based automatic framework for evaluating code clone detection tools
Proceedings of the 2008 C3S2E conference
Scenario-Based Comparison of Clone Detection Techniques
ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
Science of Computer Programming
Financial incentives and the "performance of crowds"
Proceedings of the ACM SIGKDD Workshop on Human Computation
Towards the validation of plagiarism detection tools by means of grammar evolution
IEEE Transactions on Evolutionary Computation
Plagiarism and authorship analysis: introduction to the special issue
Language Resources and Evaluation
Proceedings of the 20th international conference on World wide web
Comparative evaluation of text- and citation-based plagiarism detection approaches using guttenplag
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Obfuscating plagiarism detection: vulnerabilities and solutions
Proceedings of the 12th International Conference on Computer Systems and Technologies
Plagiarism detection based on structural information
Proceedings of the 20th ACM international conference on Information and knowledge management
A weighted profile intersection measure for profile-based authorship attribution
MICAI'11 Proceedings of the 10th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Retrieving candidate plagiarised documents using query expansion
ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Journal of the American Society for Information Science and Technology
Plag-Inn: intrinsic plagiarism detection using grammar trees
NLDB'12 Proceedings of the 17th international conference on Applications of Natural Language Processing and Information Systems
Analysis and extraction of sentence-level paraphrase sub-corpus in CS education
Proceedings of the 13th annual conference on Information technology education
A feasibility study on using clustering algorithms in programming education research
Proceedings of the 13th annual conference on Information technology education
Re-examining machine translation metrics for paraphrase identification
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Online plagiarism detection through exploiting lexical, syntactic, and semantic information
ACL '12 Proceedings of the ACL 2012 System Demonstrations
Determining and characterizing the reused text for plagiarism detection
Expert Systems with Applications: An International Journal
Paraphrase acquisition via crowdsourcing and machine learning
ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction
Plagiarism Detection for Indonesian Texts
Proceedings of International Conference on Information Integration and Web-based Applications & Services
Plagiarism meets paraphrasing: Insights for the next generation in automatic plagiarism detection
Computational Linguistics
Hi-index | 0.00 |
We present an evaluation framework for plagiarism detection. The framework provides performance measures that address the specifics of plagiarism detection, and the PAN-PC-10 corpus, which contains 64 558 artificial and 4 000 simulated plagiarism cases, the latter generated via Amazon's Mechanical Turk. We discuss the construction principles behind the measures and the corpus, and we compare the quality of our corpus to existing corpora. Our analysis gives empirical evidence that the construction of tailored training corpora for plagiarism detection can be automated, and hence be done on a large scale.