Interprocedural slicing using dependence graphs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Reducing the space requirement of suffix trees
Software—Practice & Experience
Simple and fast linear space computation of longest common subsequences
Information Processing Letters
Software for detecting suspected plagiarism: comparing structure and attribute-counting systems
ACSE '96 Proceedings of the 1st Australasian conference on Computer science education
Evaluating Clone Detection Tools for Use during Preventative Maintenance
SCAM '02 Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation
On finding duplication and near-duplication in large software systems
WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Identifying Similar Code with Program Dependence Graphs
WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient randomized pattern-matching algorithms
IBM Journal of Research and Development - Mathematics and computing
Introduction to Algorithms, Third Edition
Introduction to Algorithms, Third Edition
Computer algorithms for plagiarism detection
IEEE Transactions on Education
A universal algorithm for sequential data compression
IEEE Transactions on Information Theory
Shared information and program plagiarism detection
IEEE Transactions on Information Theory
Towards a multi-scale approach for source code approximate match report
Proceedings of the 4th International Workshop on Software Clones
Viewing functions as token sequences to highlight similarities in source code
Science of Computer Programming
Hi-index | 0.01 |
The high availability of a huge number of documents on the Web makes plagiarism very attractive and easy. This plagiarism concerns any kind of document, natural language texts as well as more structured information such as programs. In order to cope with this problem, many tools and algorithms have been proposed to find similarities. In this paper we present a new algorithm designed to detect similarities in source codes. Contrary to existing methods, this algorithm relies on the notion of function and focuses on obfuscation with inlining and outlining of functions. This method is also efficient against insertions, deletions and permutations of instruction blocks. It is based on code factorization and uses adapted pattern matching algorithms and structures such as suffix arrays.