Automatic mining of functionally equivalent code fragments via random testing

Authors:
Lingxiao Jiang;Zhendong Su
Affiliations:
University of California, Davis, Davis, CA, USA;University of California, Davis, Davis, CA, USA
Venue:
Proceedings of the eighteenth international symposium on Software testing and analysis
Year:
2009

Citing 37
Cited 7

An explicit separation of relativised random polynomial time and relativised deterministic polynomial time

Information Processing Letters
Incremental program testing using program dependence graphs

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A semantic model of program faults

ISSTA '96 Proceedings of the 1996 ACM SIGSOFT international symposium on Software testing and analysis
Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance

SIAM Journal on Computing
Pattern matching for clone and concept detection

Reverse engineering
Software watermarking: models and dynamic embeddings

Proceedings of the 26th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast Probabilistic Algorithms for Verification of Polynomial Identities

Journal of the ACM (JACM)
Operational and Semantic Equivalence Between Recursive Programs

Journal of the ACM (JACM)
Bugs as deviant behavior: a general approach to inferring errors in systems code

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Software Testing Techniques

Software Testing Techniques
Principles of Program Analysis

Principles of Program Analysis
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics

ICSM '96 Proceedings of the 1996 International Conference on Software Maintenance
Operational Semantics and Program Equivalence

Applied Semantics, International Summer School, APPSEM 2000, Caminha, Portugal, September 9-15, 2000, Advanced Lectures
CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs

CC '02 Proceedings of the 11th International Conference on Compiler Construction
A Sound Metalogical Semantics for Input/Output Effects

CSL '94 Selected Papers from the 8th International Workshop on Computer Science Logic
To the Functional Equivalence of Turing Machines

FCT '87 Proceedings of the International Conference on Fundamentals of Computation Theory
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Identifying Similar Code with Program Dependence Graphs

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
DMS®: Program Transformations for Practical Scalable Software Evolution

Proceedings of the 26th International Conference on Software Engineering
Clone Detection in Source Code by Frequent Itemset Techniques

SCAM '04 Proceedings of the Source Code Analysis and Manipulation, Fourth IEEE International Workshop
DART: directed automated random testing

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
An empirical study of code clone genealogies

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
CUTE: a concolic unit testing engine for C

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Static error detection using semantic inconsistency inference

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
CP-Miner: a tool for finding copy-paste and related bugs in operating system code

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Efficient online detection of dynamic control dependence

Proceedings of the 2007 international symposium on Software testing and analysis
Context-based detection of clone-related bugs

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
A dynamic birthmark for java

Proceedings of the twenty-second IEEE/ACM international conference on Automated software engineering
Scalable detection of semantic clones

Proceedings of the 30th international conference on Software engineering
A Combined Static and Dynamic Software Birthmark Based on Component Dependence Graph

IIH-MSP '08 Proceedings of the 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing
Detecting Theft of Java Applications via a Static Birthmark Based on Weighted Stack Patterns

IEICE - Transactions on Information and Systems
Automatic generation of random self-checking test cases

IBM Systems Journal
An Input/Output Semantics for Distributed Program Equivalence Reasoning

Electronic Notes in Theoretical Computer Science (ENTCS)
Exploiting traces in program analysis

TACAS'06 Proceedings of the 12th international conference on Tools and Algorithms for the Construction and Analysis of Systems

Mining API mapping for language migration

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
A study of the uniqueness of source code

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Automatic workarounds for web applications

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
MeCC: memory comparison-based clone detector

Proceedings of the 33rd International Conference on Software Engineering
CBCD: cloned buggy code detector

Proceedings of the 34th International Conference on Software Engineering
Liberating the programmer with prorogued programming

Proceedings of the ACM international symposium on New ideas, new paradigms, and reflections on programming and software
Automatic recovery from runtime failures

Proceedings of the 2013 International Conference on Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Similar code may exist in large software projects due to some common software engineering practices, such as copying and pasting code and n-version programming. Although previous work has studied syntactic equivalence and small-scale, coarse-grained program-level and function-level semantic equivalence, it is not known whether significant fine-grained, code-level semantic duplications exist. Detecting such semantic equivalence is also desirable because it can enable many applications such as code understanding, maintenance, and optimization. In this paper, we introduce the first algorithm to automatically mine functionally equivalent code fragments of arbitrary size - down to an executable statement. Our notion of functional equivalence is based on input and output behavior. Inspired by Schwartz's randomized polynomial identity testing, we develop our core algorithm using automated random testing: (1) candidate code fragments are automatically extracted from the input program; and (2) random inputs are generated to partition the code fragments based on their output values on the generated inputs. We implemented the algorithm and conducted a large-scale empirical evaluation of it on the Linux kernel 2.6.24. Our results show that there exist many functionally equivalent code fragments that are syntactically different (i.e., they are unlikely due to copying and pasting code). The algorithm also scales to million-line programs; it was able to analyze the Linux kernel with several days of parallel processing.