Empirical evaluation of clone detection using syntax suffix trees

Authors:
Raimar Falke;Pierre Frenzel;Rainer Koschke
Affiliations:
University of Bremen, Bremen, Germany;University of Bremen, Bremen, Germany;University of Bremen, Bremen, Germany
Venue:
Empirical Software Engineering
Year:
2008

Citing 43
Cited 8

Identifying syntactic differences between two programs

Software—Practice & Experience
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Parameterized pattern matching: algorithms and applications

Journal of Computer and System Sciences
Sim: a utility for detecting similarity in computer programs

SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
Refactoring: improving the design of existing code

Refactoring: improving the design of existing code
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
An empirical study of operating systems errors

SOSP '01 Proceedings of the eighteenth ACM symposium on Operating systems principles
Growth, evolution, and structural change in open source software

IWPSE '01 Proceedings of the 4th International Workshop on Principles of Software Evolution
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics

ICSM '96 Proceedings of the 1996 International Conference on Software Maintenance
An Approach to Identify Duplicated Web Pages

COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
On Software Maintenance Process Improvement Based on Code Clone Analysis

PROFES '02 Proceedings of the 4th International Conference on Product Focused Software Process Improvement
Measuring Clone Based Reengineering Opportunities

METRICS '99 Proceedings of the 6th International Symposium on Software Metrics
Software Quality Analysis by Code Clones in Industrial Legacy Software

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
Evaluating Clone Detection Tools for Use during Preventative Maintenance

SCAM '02 Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
An Intermediate Representation for Reverse Engineering Analyses

WCRE '98 Proceedings of the Working Conference on Reverse Engineering (WCRE'98)
Advanced Clone-Analysis to Support Object-Oriented System Refactoring

WCRE '00 Proceedings of the Seventh Working Conference on Reverse Engineering (WCRE'00)
Identifying Similar Code with Program Dependence Graphs

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Modeling clones evolution through time series

ICSM '01 Proceedings of the IEEE International Conference on Software Maintenance (ICSM'01)
Assessing the Benefits of Incorporating Function Clone Detection in a Development Process

ICSM '97 Proceedings of the International Conference on Software Maintenance
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
A Language Independent Approach for Detecting Duplicated Code

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Comprehending Reality " Practical Barriers to Industrial Adoption of Software Maintenance Automation

IWPC '03 Proceedings of the 11th IEEE International Workshop on Program Comprehension
Identification of High-Level Concept Clones in Source Code

Proceedings of the 16th IEEE international conference on Automated software engineering
Winnowing: local algorithms for document fingerprinting

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Finding Function Clones in Web Applications

CSMR '03 Proceedings of the Seventh European Conference on Software Maintenance and Reengineering
Identifying redundancy in source code using fingerprints

CASCON '93 Proceedings of the 1993 conference of the Centre for Advanced Studies on Collaborative research: software engineering - Volume 1
An Evaluation of Clone Detection Techniques for Identifying Crosscutting Concerns

ICSM '04 Proceedings of the 20th IEEE International Conference on Software Maintenance
An Ethnographic Study of Copy and Paste Programming Practices in OOPL

ISESE '04 Proceedings of the 2004 International Symposium on Empirical Software Engineering
Clone Detection in Source Code by Frequent Itemset Techniques

SCAM '04 Proceedings of the Source Code Analysis and Manipulation, Fourth IEEE International Workshop
Evaluating Clone Detection Techniques from a Refactoring Perspective

Proceedings of the 19th IEEE international conference on Automated software engineering
Practical language-independent detection of near-miss clones

CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
An empirical study of code clone genealogies

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Improved Tool Support for the Investigation of Duplication in Software

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
On the Use of Clone Detection for Identifying Crosscutting Concern Code

IEEE Transactions on Software Engineering
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code

IEEE Transactions on Software Engineering
"Cloning Considered Harmful" Considered Harmful

WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
Clone Detection Using Abstract Syntax Suffix Trees

WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
CP-Miner: a tool for finding copy-paste and related bugs in operating system code

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Algorithms for Reporting and Counting Geometric Intersections

IEEE Transactions on Computers
Comparison and Evaluation of Clone Detection Tools

IEEE Transactions on Software Engineering
Finding Clones with Dup: Analysis of an Experiment

IEEE Transactions on Software Engineering

An evaluation of code similarity identification for the grow-and-prune model

Journal of Software Maintenance and Evolution: Research and Practice - Special Issue on the 12th Conference on Software Maintenance and Reengineering (CSMR 2008)
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Science of Computer Programming
Near-miss function clones in open source software: an empirical study

Journal of Software Maintenance and Evolution: Research and Practice - Working Conference on Reverse Engineering (WCRE 2008)
A hybrid approach (syntactic and textual) to clone detection

Proceedings of the 4th International Workshop on Software Clones
An extended assessment of type-3 clones as detected by state-of-the-art tools

Software Quality Control
Representing clones in a localized manner

Proceedings of the 5th International Workshop on Software Clones
Large-scale copy detection

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Understanding privacy policies

Empirical Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reusing software through copying and pasting is a continuous plague in software development despite the fact that it creates serious maintenance problems. Various techniques have been proposed to find duplicated redundant code (also known as software clones). A recent study has compared these techniques and shown that token-based clone detection based on suffix trees is fast but yields clone candidates that are often not syntactic units. Current techniques based on abstract syntax trees--on the other hand--find syntactic clones but are considerably less efficient. This paper describes how we can make use of suffix trees to find syntactic clones in abstract syntax trees. This new approach is able to find syntactic clones in linear time and space. The paper reports the results of a large case study in which we empirically compare the new technique to other techniques using the Bellon benchmark for clone detectors. The Bellon benchmark consists of clone pairs validated by humans for eight software systems written in C or Java from different application domains. The new contributions of this paper over the conference paper are the additional analysis of Java programs, the exploration of an alternative path that uses parse trees instead of abstract syntax trees, and the investigation of the impact on recall and precision when clone analyses insist on consistent parameter renaming.