On finding duplication and near-duplication in large software systems

Authors:
B. S. Baker
Affiliations:
-
Venue:
WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Year:
1995

Citing 14
Cited 111

The X window system

ACM Transactions on Graphics (TOG)
Detecting plagiarism in student Pascal programs

The Computer Journal
Detecting equality of variables in programs

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Identifying the semantic and textual differences between two versions of a program

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
A theory of parameterized pattern matching: algorithms and applications

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Parameterized pattern matching: algorithms and applications

Journal of Computer and System Sciences
Parameterized pattern matching by Boyer-Moore-type algorithms

Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Linear Algorithm for Data Compression via String Matching

Journal of the ACM (JACM)
The UNIX Programming Environment

The UNIX Programming Environment
Status Report: Software Reusability

IEEE Software
Substring Matching for Clone Detection and Change Tracking

ICSM '94 Proceedings of the International Conference on Software Maintenance
Measurements of program similarity in identical task environments

ACM SIGPLAN Notices
Finding similar files in a large file system

WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference

The software bookshelf

IBM Systems Journal
Reengineering analysis of object-oriented systems via duplication analysis

ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
Maintenance support tools for JAVA programs: CCFinder and JAAT

ICSE '01 Proceedings of the 23rd International Conference on Software Engineering
Clones occurrence in Java and Modula-3 software systems

Advances in software engineering
The software bookshelf

Advances in software engineering
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Tool Demonstration: Finding Duplicated Code Using Program Dependences

ESOP '01 Proceedings of the 10th European Symposium on Programming Languages and Systems
On Software Maintenance Process Improvement Based on Code Clone Analysis

PROFES '02 Proceedings of the 4th International Conference on Product Focused Software Process Improvement
Using Slicing to Identify Duplication in Source Code

SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Quantification of structural information: on a question raised by Brooks

ACM SIGSOFT Software Engineering Notes
Clones occurence in large object oriented software packages

CASCON '98 Proceedings of the 1998 conference of the Centre for Advanced Studies on Collaborative research
Modeling clones evolution through time series

ICSM '01 Proceedings of the IEEE International Conference on Software Maintenance (ICSM'01)
Identification of High-Level Concept Clones in Source Code

Proceedings of the 16th IEEE international conference on Automated software engineering
Winnowing: local algorithms for document fingerprinting

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
On automated grading of programming assignments in an academic institution

Computers & Education
Detection of Redundant Code Using R2D2

Software Quality Control
Automatic Method Completion

Proceedings of the 19th IEEE international conference on Automated software engineering
Evaluating Clone Detection Techniques from a Refactoring Perspective

Proceedings of the 19th IEEE international conference on Automated software engineering
Practical language-independent detection of near-miss clones

CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
Using Origin Analysis to Detect Merging and Splitting of Source Code Entities

IEEE Transactions on Software Engineering
Beyond templates: a study of clones in the STL and some general implications

Proceedings of the 27th international conference on Software engineering
K-gram based software birthmarks

Proceedings of the 2005 ACM symposium on Applied computing
Hybridizing evolutionary algorithms and clustering algorithms to find source-code clones

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Detecting higher-level similarity patterns in programs

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
SDD: high performance code clone detection system for large scale source code

OOPSLA '05 Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
On the Use of Clone Detection for Identifying Crosscutting Concern Code

IEEE Transactions on Software Engineering
On feature traceability in object oriented programs

TEFSE '05 Proceedings of the 3rd international workshop on Traceability in emerging forms of software engineering
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code

IEEE Transactions on Software Engineering
Supporting the analysis of clones in software systems: Research Articles

Journal of Software Maintenance and Evolution: Research and Practice - IEEE International Conference on Software Maintenance (ICSM2005)
Constructing universal version history

Proceedings of the 2006 international workshop on Mining software repositories
A novel approach to optimize clone refactoring activity

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Desktop tools for offline plagiarism detection in computer programs

Informatics in education
GPLAG: detection of software plagiarism by program dependence graph analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Phoenix-based clone detection using suffix trees

Proceedings of the 44th annual Southeast regional conference
Efficient plagiarism detection for large code repositories

Software—Practice & Experience
Detecting near-duplicates for web crawling

Proceedings of the 16th international conference on World Wide Web
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
CP-Miner: a tool for finding copy-paste and related bugs in operating system code

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Graph-Based Procedural Abstraction

Proceedings of the International Symposium on Code Generation and Optimization
New Frontiers of Reverse Engineering

FOSE '07 2007 Future of Software Engineering
Deducing similarities in Java sources from bytecodes

ATEC '98 Proceedings of the annual conference on USENIX Annual Technical Conference
Large-Scale Code Reuse in Open Source Software

FLOSS '07 Proceedings of the First International Workshop on Emerging Trends in FLOSS Research and Development
Method and implementation for investigating code clones in a software system

Information and Software Technology
Context-based detection of clone-related bugs

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Comparison and Evaluation of Clone Detection Tools

IEEE Transactions on Software Engineering
Finding Clones with Dup: Analysis of an Experiment

IEEE Transactions on Software Engineering
Improving modularity by refactoring code clones: a feasibility study on Linux

ACM SIGSOFT Software Engineering Notes
Aspect mining from a modelling perspective

International Journal of Computer Applications in Technology
Tracking source locations

Proceedings of the 30th international conference on Software engineering
Clone detection in automotive model-based development

Proceedings of the 30th international conference on Software engineering
Clonetracker: tool support for code clone management

Proceedings of the 30th international conference on Software engineering
Towards a mutation-based automatic framework for evaluating code clone detection tools

Proceedings of the 2008 C3S2E conference
Evaluation of source code copy detection methods on freebsd

Proceedings of the 2008 international working conference on Mining software repositories
Locating dependence structures using search-based slicing

Information and Software Technology
Empirical evaluation of clone detection using syntax suffix trees

Empirical Software Engineering
"Cloning considered harmful" considered harmful: patterns of cloning in software

Empirical Software Engineering
Enhancing Software Product Line Maintenance with Source Code Mining

WASA '08 Proceedings of the Third International Conference on Wireless Algorithms, Systems, and Applications
Clone detection and removal for Erlang/OTP within a refactoring environment

Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation
An evaluation of code similarity identification for the grow-and-prune model

Journal of Software Maintenance and Evolution: Research and Practice - Special Issue on the 12th Conference on Software Maintenance and Reengineering (CSMR 2008)
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Science of Computer Programming
Procedural Abstraction with Reverse Prefix Trees

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
COMPASS: A Community-driven Parallelization Advisor for Sequential Software

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Automatic mining of functionally equivalent code fragments via random testing

Proceedings of the eighteenth international symposium on Software testing and analysis
Clone detection via structural abstraction

Software Quality Control
Extending the reflexion method for consolidating software variants into product lines

Software Quality Control
Finding Similarities in Source Code Through Factorization

Electronic Notes in Theoretical Computer Science (ENTCS)
Towards a refactoring guideline using code clone classification

Proceedings of the 2nd Workshop on Refactoring Tools
Delving source code with formal concept analysis

Computer Languages, Systems and Structures
Tree-pattern-based duplicate code detection

Proceedings of the ACM first international workshop on Data-intensive software management and mining
Behavior based software theft detection

Proceedings of the 16th ACM conference on Computer and communications security
Clone detection and elimination for Haskell

Proceedings of the 2010 ACM SIGPLAN workshop on Partial evaluation and program manipulation
An empirical study on the maintenance of source code clones

Empirical Software Engineering
Clone region descriptors: Representing and tracking duplication in source code

ACM Transactions on Software Engineering and Methodology (TOSEM)
A survey of automated code-level aspect mining techniques

Transactions on aspect-oriented software development IV
Distinguishing copies from originals in software clones

Proceedings of the 4th International Workshop on Software Clones
Can clone detection support quality assessments of requirements specifications?

Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2
Code clones in feature-oriented software product lines

GPCE '10 Proceedings of the ninth international conference on Generative programming and component engineering
Scalable and systematic detection of buggy inconsistencies in source code

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
A study of the uniqueness of source code

Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Separation of scattered concerns: a graph based approach for aspect mining

ACM SIGSOFT Software Engineering Notes
An extended assessment of type-3 clones as detected by state-of-the-art tools

Software Quality Control
Finding software license violations through binary code clone detection

Proceedings of the 8th Working Conference on Mining Software Repositories
Value-based program characterization and its application to software plagiarism detection

Proceedings of the 33rd International Conference on Software Engineering
Incremental clone detection and elimination for erlang programs

FASE'11/ETAPS'11 Proceedings of the 14th international conference on Fundamental approaches to software engineering: part of the joint European conferences on theory and practice of software
A security policy oracle: detecting security holes using multiple API implementations

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Studying software evolution using artefacts' shared information content

Science of Computer Programming
Self-configuring user interface components

Proceedings of the 1st international workshop on Semantic models for adaptive interactive systems
Function clone detection in web applications: a semiautomated approach

Journal of Web Engineering
Identifying cloned navigational patterns in web applications

Journal of Web Engineering
An investigation of clustering algorithms in the identification of similar web pages

Journal of Web Engineering
Designing useful tools for developers

Proceedings of the 3rd ACM SIGPLAN workshop on Evaluation and usability of programming languages and tools
Similar code detection and elimination for erlang programs

PADL'10 Proceedings of the 12th international conference on Practical Aspects of Declarative Languages
AuDeNTES: Automatic Detection of teNtative plagiarism according to a rEference Solution

ACM Transactions on Computing Education (TOCE)
Measuring similarity of large software systems based on source code correspondence

PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement
An empirical study on inconsistent changes to code clones at the release level

Science of Computer Programming
Construction and analysis of vector space models for use in aspect mining

Proceedings of the 50th Annual Southeast Regional Conference
Clones: what is that smell?

Empirical Software Engineering
What kind of and how clones are refactored?: a case study of three OSS projects

Proceedings of the Fifth Workshop on Refactoring Tools
A first step towards algorithm plagiarism detection

Proceedings of the 2012 International Symposium on Software Testing and Analysis
CBCD: cloned buggy code detector

Proceedings of the 34th International Conference on Software Engineering
Can I clone this piece of code here?

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Automatic recognition of students' sorting algorithm implementations in a data structures and algorithms course

Proceedings of the 12th Koli Calling International Conference on Computing Education Research
XIAO: tuning code clones at hands of engineers in practice

Proceedings of the 28th Annual Computer Security Applications Conference
Resource requirement prediction using clone detection technique

Future Generation Computer Systems
RAMC: runtime abstract memory context based plagiarism detection in binary code

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Detecting source code similarity using code abstraction

Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
CodeBlast: a two-stage algorithm for improved program similarity matching in large software repositories

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Data clone detection and visualization in spreadsheets

Proceedings of the 2013 International Conference on Software Engineering
Does the discipline of preprocessor annotations matter?: a controlled experiment

Proceedings of the 12th international conference on Generative programming: concepts & experiences
Pattern mining of cloned codes in software systems

Information Sciences: an International Journal
Comparison and evaluation of source code mining tools and techniques: A qualitative approach

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes how a program called dup can be used to locate instances of duplication or near-duplication in a software system. Dup reports both textually identical sections of code and sections that are the same textually except for systematic substitution of one set of variable names and constants for another. Further processing locates longer sections of code that are the same except for other small modifications. Experimental results from running dup on millions of lines from two large software systems show dup to be both effective at locating duplication and fast. Applications could include identifying sections of code that should be replaced by procedures, elimination of duplication during reengineering of the system, redocumentation to include references to copies, and debugging.