ANTLR: a predicated-LL(k) parser generator
Software—Practice & Experience
Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Sim: a utility for detecting similarity in computer programs
SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
CHECK: a document plagiarism detection system
SAC '97 Proceedings of the 1997 ACM symposium on Applied computing
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
A tool that detects plagiarism in Pascal programs
SIGCSE '81 Proceedings of the twelfth SIGCSE technical symposium on Computer science education
Measurements of program similarity in identical task environments
ACM SIGPLAN Notices
Sentence-based natural language plagiarism detection
Journal on Educational Resources in Computing (JERIC)
Fast and effective kernels for relational learning from texts
Proceedings of the 24th international conference on Machine learning
Program plagiarism detection using parse tree Kernels
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
A Text Similarity Meta-Search Engine Based on Document Fingerprints and Search Results Records
WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
PPChecker: plagiarism pattern checker in document copy detection
TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Computer algorithms for plagiarism detection
IEEE Transactions on Education
Shared information and program plagiarism detection
IEEE Transactions on Information Theory
Expert Systems with Applications: An International Journal
A Source Code Similarity System for Plagiarism Detection
The Computer Journal
Hi-index | 0.00 |
Program plagiarism detection is a task of detecting plagiarized code pairs among a set of source codes. In this paper, we propose a code plagiarism detection system that uses a parse tree kernel. Our parse tree kernel calculates a similarity value between two source codes in terms of their parse tree similarity. Since parse trees contain the essential syntactic structure of source codes, the system effectively handles structural information. The contributions of this paper are two-fold. First, we propose a parse tree kernel that is optimized for program source code. The evaluation shows that our system based on this kernel outperforms well-known baseline systems. Second, we collected a large number of real-world Java source codes from a university programming class. This test set was manually analyzed and tagged by two independent human annotators to mark plagiarized codes. It can be used to evaluate the performance of various detection systems in real-world environments. The experiments with the test set show that the performance of our plagiarism detection system reaches to 93% level of human annotators.