Copy detection mechanisms for digital documents
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Sim: a utility for detecting similarity in computer programs
SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
Collaboration or plagiarism: what happens when students work together
ITiCSE '99 Proceedings of the 4th annual SIGCSE/SIGCUE ITiCSE conference on Innovation and technology in computer science education
Software for detecting suspected plagiarism: comparing structure and attribute-counting systems
ACSE '96 Proceedings of the 1st Australasian conference on Computer science education
Using metrics to detect plagiarism (student paper)
Proceedings of the seventh annual consortium for computing in small colleges central plains conference on The journal of computing in small colleges
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
On automated grading of programming assignments in an academic institution
Computers & Education
Source Code Similarity Detection Using Adaptive Local Alignment of Keywords
PDCAT '07 Proceedings of the Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies
Program plagiarism detection using parse tree Kernels
PRICAI'06 Proceedings of the 9th Pacific Rim international conference on Artificial intelligence
Shared information and program plagiarism detection
IEEE Transactions on Information Theory
Hi-index | 0.00 |
This paper proposes a new method for detecting plagiarized pairs of source codes among a large set of source codes. The typical algorithms for detecting code plagiarism, which are largely exploited up to now, are based on Greedy-String Tiling or on local alignments of the two strings. This paper introduces a variant of the local alignment, namely, the adaptive local alignment, which exploits an adaptive similarity matrix. Each entry of the adaptive similarity matrix is the logarithm of the probabilities of the keywords based on the frequencies in a given set of programs. We experimented with this method using a set of programs submitted to more than 10 real programming contests. According to the experimental results, the distribution of the adaptive local alignment is more sensitive than that of the previous local alignments that used a fixed similarity matrix (+1 for match, −1 for mismatch, and −2 for gap), and the performance of the adaptive local alignment is superior to Greedy-String Tiling for detecting various plagiarism cases.