Identification of program similarity in large populations
The Computer Journal - Special issue on procedural programming
Exploring the similarity space
ACM SIGIR Forum
Sim: a utility for detecting similarity in computer programs
SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
Experimentation as a way of life: Okapi at TREC
Information Processing and Management: an International Journal - The sixth text REtrieval conference (TREC-6)
A probabilistic model of information retrieval: development and comparative experiments Part 2
Information Processing and Management: an International Journal
Cheating and plagiarism: perceptions and practices of first year IT students
Proceedings of the 7th annual conference on Innovation and technology in computer science education
Genetic Programming IV: Routine Human-Competitive Machine Intelligence
Genetic Programming IV: Routine Human-Competitive Machine Intelligence
"Uni cheats racket": a case study in plagiarism investigation
ACE '04 Proceedings of the Sixth Australasian Conference on Computing Education - Volume 30
A generic ranking function discovery framework by genetic programming for information retrieval
Information Processing and Management: an International Journal
Proceedings of the 8th annual conference on Genetic and evolutionary computation
GPLAG: detection of software plagiarism by program dependence graph analysis
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient plagiarism detection for large code repositories
Software—Practice & Experience
The particle swarm - explosion, stability, and convergence in amultidimensional complex space
IEEE Transactions on Evolutionary Computation
Plagiarism in programming assignments
IEEE Transactions on Education
Analysis of Components for Generalization using Multidimensional Scaling
Fundamenta Informaticae
Plagiarising of source code by novice programmers a "cry for help"?
Proceedings of the 2009 Annual Research Conference of the South African Institute of Computer Scientists and Information Technologists
Instructor-centric source code plagiarism detection and plagiarism corpus
Proceedings of the 17th ACM annual conference on Innovation and technology in computer science education
Analysis of Components for Generalization using Multidimensional Scaling
Fundamenta Informaticae
Hi-index | 0.01 |
Detecting whether computer program code is a student's original work or has been copied from another student or some other source is a major problem for many universities. Detection methods based on the information retrieval concepts of indexing and similarity matching scale well to large collections of files, but require appropriate similarity functions for good performance. We have used particle swarm optimization and genetic programming to evolve similarity functions that are suited to computer program code. Using a training set of plagiarised and non-plagiarised programs we have evolved better parameter values for the previously published Okapi BM25 similarity function. We have then used genetic programming to evolve completely new similarity functions that do not conform to any predetermined structure. We found that the evolved similarity functions outperformed the human developed Okapi BM25 function. We also found that a detection system using the evolved functions was more accurate than the the best code plagiarism detection system in use today, and scales much better to large collections of files. The evolutionary computing techniques have been extremely useful in finding similarity functions that advance the state of the art in code plagiarism detection.