Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
Recursive star-tree parallel data structure
SIAM Journal on Computing
A theory of parameterized pattern matching: algorithms and applications
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Reducing the space requirement of suffix trees
Software—Practice & Experience
An algorithmic approach to the detection and prevention of plagiarism
ACM SIGCSE Bulletin
Introduction to algorithms
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
Constructing Suffix Trees On-Line in Linear Time
Proceedings of the IFIP 12th World Computer Congress on Algorithms, Software, Architecture - Information Processing '92, Volume 1 - Volume I
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
The complexity of theorem-proving procedures
STOC '71 Proceedings of the third annual ACM symposium on Theory of computing
Identifying Similar Code with Program Dependence Graphs
WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
Using Origin Analysis to Detect Merging and Splitting of Source Code Entities
IEEE Transactions on Software Engineering
Phoenix-based clone detection using suffix trees
Proceedings of the 44th annual Southeast regional conference
Comparison and Evaluation of Clone Detection Tools
IEEE Transactions on Software Engineering
Scenario-Based Comparison of Clone Detection Techniques
ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Depth-first search and linear grajh algorithms
SWAT '71 Proceedings of the 12th Annual Symposium on Switching and Automata Theory (swat 1971)
Finding Similarities in Source Code Through Factorization
Electronic Notes in Theoretical Computer Science (ENTCS)
Hi-index | 0.00 |
The detection of similarities in source code has applications not only in software re-engineering (to eliminate redundancies) but also in software plagiarism detection. This later can be a challenging problem since more or less extensive edits may have been performed on the original copy: insertion or removal of useless chunks of code, rewriting of expressions, transposition of code, inlining and outlining of functions, etc. In this paper, we propose a new similarity detection technique not only based on token sequence matching but also on the factorization of the function call graphs. The factorization process merges shared chunks (factors) of codes to cope, in particular, with inlining and outlining. The resulting call graph offers a view of the similarities with their nesting relations. It is useful to infer metrics quantifying similarity at a function level.