Detecting plagiarism in student Pascal programs
The Computer Journal
Plagiarism detection across programming languages
ACSC '06 Proceedings of the 29th Australasian Computer Science Conference - Volume 48
A statistical approach to crosslingual natural language tasks
Journal of Algorithms
Cross-language plagiarism detection
Language Resources and Evaluation
Detection of Plagiarism in Programming Assignments
IEEE Transactions on Education
DeSoCoRe: detecting source code re-use across programming languages
NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstration Session
Hi-index | 0.00 |
Internet has made available huge amounts of information, also source code. Source code repositories and, in general, programming related websites, facilitate its reuse. In this work, we propose a simple approach to the detection of cross-language source code reuse, a nearly investigated problem. Our preliminary experiments, based on character n-grams comparison, show that considering different sections of the code (i.e., comments, code, reserved words, etc.), leads to different results. When considering three programming languages: C++, Java, and Python, the best result is obtained when comments are discarded and the entire source code is considered.