The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Interprocedural slicing using dependence graphs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Semantics-preserving procedure extraction
Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics
ICSM '96 Proceedings of the 1996 International Conference on Software Maintenance
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
ICSE '81 Proceedings of the 5th international conference on Software engineering
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
An Evaluation of Clone Detection Techniques for Identifying Crosscutting Concerns
ICSM '04 Proceedings of the 20th IEEE International Conference on Software Maintenance
Clone Detection in Source Code by Frequent Itemset Techniques
SCAM '04 Proceedings of the Source Code Analysis and Manipulation, Fourth IEEE International Workshop
Similarity evaluation on tree-structured data
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Detecting higher-level similarity patterns in programs
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
GPLAG: detection of software plagiarism by program dependence graph analysis
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
ICSE '07 Proceedings of the 29th international conference on Software Engineering
CP-Miner: a tool for finding copy-paste and related bugs in operating system code
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Comparison and Evaluation of Clone Detection Tools
IEEE Transactions on Software Engineering
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
Science of Computer Programming
Complete and accurate clone detection in graph-based models
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
COMPASS: A Community-driven Parallelization Advisor for Sequential Software
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Automatic mining of functionally equivalent code fragments via random testing
Proceedings of the eighteenth international symposium on Software testing and analysis
Detecting code clones in binary executables
Proceedings of the eighteenth international symposium on Software testing and analysis
DebugAdvisor: a recommender system for debugging
Proceedings of the the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
An empirical study on the maintenance of source code clones
Empirical Software Engineering
Finding similar defects using synonymous identifier retrieval
Proceedings of the 4th International Workshop on Software Clones
JCCD: a flexible and extensible API for implementing custom code clone detectors
Proceedings of the IEEE/ACM international conference on Automated software engineering
Matching dependence-related queries in the system dependence graph
Proceedings of the IEEE/ACM international conference on Automated software engineering
Scalable and systematic detection of buggy inconsistencies in source code
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
MeCC: memory comparison-based clone detector
Proceedings of the 33rd International Conference on Software Engineering
Value-based program characterization and its application to software plagiarism detection
Proceedings of the 33rd International Conference on Software Engineering
Proceedings of the 33rd International Conference on Software Engineering
Have your spaghetti and eat it too: evolutionary algorithmics and post-evolutionary analysis
Genetic Programming and Evolvable Machines
Semantic clone detection for model-based development of embedded systems
Proceedings of the 14th international conference on Model driven engineering languages and systems
Approximate graph clustering for program characterization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
AuDeNTES: Automatic Detection of teNtative plagiarism according to a rEference Solution
ACM Transactions on Computing Education (TOCE)
An empirical study on inconsistent changes to code clones at the release level
Science of Computer Programming
Minersoft: Software retrieval in grid and cloud computing infrastructures
ACM Transactions on Internet Technology (TOIT)
Automatic source code transformation for GPUs based on program comprehension
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Empirical Software Engineering
A first step towards algorithm plagiarism detection
Proceedings of the 2012 International Symposium on Software Testing and Analysis
CBCD: cloned buggy code detector
Proceedings of the 34th International Conference on Software Engineering
Can I clone this piece of code here?
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Boreas: an accurate and scalable token-based approach to code clone detection
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Typestate-based semantic code search over partial programs
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
XIAO: tuning code clones at hands of engineers in practice
Proceedings of the 28th Annual Computer Security Applications Conference
Detecting missing method calls as violations of the majority rule
ACM Transactions on Software Engineering and Methodology (TOSEM)
RAMC: runtime abstract memory context based plagiarism detection in binary code
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Detecting source code similarity using code abstraction
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Juxtapp: a scalable system for detecting code reuse among android applications
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Searching for better configurations: a rigorous approach to clone evaluation
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Using clone detection to find malware in acrobat files
CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
Hi-index | 0.00 |
Several techniques have been developed for identifying similar code fragments in programs. These similar fragments, referred to as code clones, can be used to identify redundant code, locate bugs, or gain insight into program design. Existing scalable approaches to clone detection are limited to finding program fragments that are similar only in their contiguous syntax. Other, semantics-based approaches are more resilient to differences in syntax, such as reordered statements, related statements interleaved with other unrelated statements, or the use of semantically equivalent control structures. However, none of these techniques have scaled to real world code bases. These approaches capture semantic information from Program Dependence Graphs (PDGs), program representations that encode data and control dependencies between statements and predicates. Our definition of a code clone is also based on this representation: we consider program fragments with isomorphic PDGs to be clones. In this paper, we present the first scalable clone detection algorithm based on this definition of semantic clones. Our insight is the reduction of the difficult graph similarity problem to a simpler tree similarity problem by mapping carefully selected PDG subgraphs to their related structured syntax. We efficiently solve the tree similarity problem to create a scalable analysis. We have implemented this algorithm in a practical tool and performed evaluations on several million-line open source projects, including the Linux kernel. Compared with previous approaches, our tool locates significantly more clones, which are often more semantically interesting than simple copied and pasted code fragments.