Information Processing Letters
Incremental program testing using program dependence graphs
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A semantic model of program faults
ISSTA '96 Proceedings of the 1996 ACM SIGSOFT international symposium on Software testing and analysis
Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance
SIAM Journal on Computing
Pattern matching for clone and concept detection
Reverse engineering
Software watermarking: models and dynamic embeddings
Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast Probabilistic Algorithms for Verification of Polynomial Identities
Journal of the ACM (JACM)
Operational and Semantic Equivalence Between Recursive Programs
Journal of the ACM (JACM)
Bugs as deviant behavior: a general approach to inferring errors in systems code
SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Software Testing Techniques
Principles of Program Analysis
Principles of Program Analysis
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics
ICSM '96 Proceedings of the 1996 International Conference on Software Maintenance
Operational Semantics and Program Equivalence
Applied Semantics, International Summer School, APPSEM 2000, Caminha, Portugal, September 9-15, 2000, Advanced Lectures
CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs
CC '02 Proceedings of the 11th International Conference on Compiler Construction
A Sound Metalogical Semantics for Input/Output Effects
CSL '94 Selected Papers from the 8th International Workshop on Computer Science Logic
To the Functional Equivalence of Turing Machines
FCT '87 Proceedings of the International Conference on Fundamentals of Computation Theory
On finding duplication and near-duplication in large software systems
WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Identifying Similar Code with Program Dependence Graphs
WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
DMS®: Program Transformations for Practical Scalable Software Evolution
Proceedings of the 26th International Conference on Software Engineering
Clone Detection in Source Code by Frequent Itemset Techniques
SCAM '04 Proceedings of the Source Code Analysis and Manipulation, Fourth IEEE International Workshop
DART: directed automated random testing
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
An empirical study of code clone genealogies
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
CUTE: a concolic unit testing engine for C
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Static error detection using semantic inconsistency inference
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
CP-Miner: a tool for finding copy-paste and related bugs in operating system code
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient online detection of dynamic control dependence
Proceedings of the 2007 international symposium on Software testing and analysis
Context-based detection of clone-related bugs
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Scalable detection of semantic clones
Proceedings of the 30th international conference on Software engineering
A Combined Static and Dynamic Software Birthmark Based on Component Dependence Graph
IIH-MSP '08 Proceedings of the 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing
Detecting Theft of Java Applications via a Static Birthmark Based on Weighted Stack Patterns
IEICE - Transactions on Information and Systems
Automatic generation of random self-checking test cases
IBM Systems Journal
An Input/Output Semantics for Distributed Program Equivalence Reasoning
Electronic Notes in Theoretical Computer Science (ENTCS)
Exploiting traces in program analysis
TACAS'06 Proceedings of the 12th international conference on Tools and Algorithms for the Construction and Analysis of Systems
Mining API mapping for language migration
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
A study of the uniqueness of source code
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Automatic workarounds for web applications
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
MeCC: memory comparison-based clone detector
Proceedings of the 33rd International Conference on Software Engineering
CBCD: cloned buggy code detector
Proceedings of the 34th International Conference on Software Engineering
Liberating the programmer with prorogued programming
Proceedings of the ACM international symposium on New ideas, new paradigms, and reflections on programming and software
Automatic recovery from runtime failures
Proceedings of the 2013 International Conference on Software Engineering
Hi-index | 0.00 |
Similar code may exist in large software projects due to some common software engineering practices, such as copying and pasting code and n-version programming. Although previous work has studied syntactic equivalence and small-scale, coarse-grained program-level and function-level semantic equivalence, it is not known whether significant fine-grained, code-level semantic duplications exist. Detecting such semantic equivalence is also desirable because it can enable many applications such as code understanding, maintenance, and optimization. In this paper, we introduce the first algorithm to automatically mine functionally equivalent code fragments of arbitrary size - down to an executable statement. Our notion of functional equivalence is based on input and output behavior. Inspired by Schwartz's randomized polynomial identity testing, we develop our core algorithm using automated random testing: (1) candidate code fragments are automatically extracted from the input program; and (2) random inputs are generated to partition the code fragments based on their output values on the generated inputs. We implemented the algorithm and conducted a large-scale empirical evaluation of it on the Linux kernel 2.6.24. Our results show that there exist many functionally equivalent code fragments that are syntactically different (i.e., they are unlikely due to copying and pasting code). The algorithm also scales to million-line programs; it was able to analyze the Linux kernel with several days of parallel processing.