An evaluation of code similarity identification for the grow-and-prune model

Authors:
Thilo Mende;Rainer Koschke;Felix Beckwermert
Affiliations:
University of Bremen, Fachbereich 3, Postfach 33 04 40, 28334 Bremen, Germany;University of Bremen, Fachbereich 3, Postfach 33 04 40, 28334 Bremen, Germany;University of Bremen, Fachbereich 3, Postfach 33 04 40, 28334 Bremen, Germany
Venue:
Journal of Software Maintenance and Evolution: Research and Practice - Special Issue on the 12th Conference on Software Maintenance and Reengineering (CSMR 2008)
Year:
2009

Citing 40
Cited 11

PuLSE: a methodology to develop software product lines

SSR '99 Proceedings of the 1999 symposium on Software reusability
Transitioning legacy assets to a product line architecture

ESEC/FSE-7 Proceedings of the 7th European software engineering conference held jointly with the 7th ACM SIGSOFT international symposium on Foundations of software engineering
Removing clones from the code

Journal of Software Maintenance: Research and Practice
Software product lines: practices and patterns

Software product lines: practices and patterns
Growth, evolution, and structural change in open source software

IWPSE '01 Proceedings of the 4th International Workshop on Principles of Software Evolution
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Substring Matching for Clone Detection and Change Tracking

ICSM '94 Proceedings of the International Conference on Software Maintenance
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics

ICSM '96 Proceedings of the 1996 International Conference on Software Maintenance
Investigating Large Software System Evolution: The Linux Kernel

COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
An Approach to Identify Duplicated Web Pages

COMPSAC '02 Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment
An Approach to Manage Variance in Legacy Systems

CSMR '99 Proceedings of the Third European Conference on Software Maintenance and Reengineering
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Partial Redesign of Java Software Systems Based on Clone Analysis

WCRE '99 Proceedings of the Sixth Working Conference on Reverse Engineering
Advanced Clone-Analysis to Support Object-Oriented System Refactoring

WCRE '00 Proceedings of the Seventh Working Conference on Reverse Engineering (WCRE'00)
Identifying Similar Code with Program Dependence Graphs

WCRE '01 Proceedings of the Eighth Working Conference on Reverse Engineering (WCRE'01)
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
A Language Independent Approach for Detecting Duplicated Code

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Experiences with Software Product Family Evolution

IWPSE '03 Proceedings of the 6th International Workshop on Principles of Software Evolution
Detecting Merging and Splitting using Origin Analysis

WCRE '03 Proceedings of the 10th Working Conference on Reverse Engineering
Understanding Phases and Styles of Object-Oriented Systems' Evolution

ICSM '04 Proceedings of the 20th IEEE International Conference on Software Maintenance
An Ethnographic Study of Copy and Paste Programming Practices in OOPL

ISESE '04 Proceedings of the 2004 International Symposium on Empirical Software Engineering
A Case Study in Refactoring a Legacy Component for Reuse in a Product Line

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code

IEEE Transactions on Software Engineering
Supporting the analysis of clones in software systems: Research Articles

Journal of Software Maintenance and Evolution: Research and Practice - IEEE International Conference on Software Maintenance (ICSM2005)
Refactoring a legacy component for reuse in a software product line: a case study: Practice Articles

Journal of Software Maintenance and Evolution: Research and Practice - IEEE International Conference on Software Maintenance (ICSM2005)
Unifying clones with a generative programming technique: a case study: Practice Articles

Journal of Software Maintenance and Evolution: Research and Practice
"Cloning Considered Harmful" Considered Harmful

WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
Clone Detection Using Abstract Syntax Suffix Trees

WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
Defining a strategy to introduce a software product line using existing embedded systems

EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Tracking Code Clones in Evolving Software

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Context-based detection of clone-related bugs

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
A Framework for Studying Clones In Large Software Systems

SCAM '07 Proceedings of the Seventh IEEE International Working Conference on Source Code Analysis and Manipulation
A Framework for Studying Clones In Large Software Systems

SCAM '07 Proceedings of the Seventh IEEE International Working Conference on Source Code Analysis and Manipulation
Comparison and Evaluation of Clone Detection Tools

IEEE Transactions on Software Engineering
Extending the Reflexion Method for Consolidating Software Variants into Product Lines

WCRE '07 Proceedings of the 14th Working Conference on Reverse Engineering
Empirical evaluation of clone detection using syntax suffix trees

Empirical Software Engineering
Supporting the Grow-and-Prune Model in Software Product Lines Evolution Using Clone Detection

CSMR '08 Proceedings of the 2008 12th European Conference on Software Maintenance and Reengineering
Measuring similarity of large software systems based on source code correspondence

PROFES'05 Proceedings of the 6th international conference on Product Focused Software Process Improvement

Extending the reflexion method for consolidating software variants into product lines

Software Quality Control
An extended assessment of type-3 clones as detected by state-of-the-art tools

Software Quality Control
Automated type-3 clone oracle using Levenshtein metric

Proceedings of the 5th International Workshop on Software Clones
A scalable goal-oriented approach to software variability recovery

Proceedings of the 15th International Software Product Line Conference, Volume 2
Using structural and textual information to capture feature coupling in object-oriented software

Empirical Software Engineering
What kind of and how clones are refactored?: a case study of three OSS projects

Proceedings of the Fifth Workshop on Refactoring Tools
Locating distinguishing features using diff sets

Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering
Managing forked product variants

Proceedings of the 16th International Software Product Line Conference - Volume 1
Quality of merge-refactorings for product lines

FASE'13 Proceedings of the 16th international conference on Fundamental Approaches to Software Engineering
A framework for managing cloned product variants

Proceedings of the 2013 International Conference on Software Engineering
Managing cloned variants: a framework and experience

Proceedings of the 17th International Software Product Line Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

In case new functionality is required, which is similar to the existing one, developers often copy the code that implements the existing functionality and adjust the copy to the new requirements. The result of the copying is code growth. If developers face maintenance problems, because of the need to make changes multiple times for the original and all its copies, they may decide to merge the original and its copies again; that is, they prune the code. This approach was named the grow-and-prune model by Faust and Verhoef. This paper describes tool support for the grow-and-prune model in the evolution of software by identifying similar functions that may be merged. These functions are identified in two steps. First, token-based clone detection is used to detect pairs of functions sharing code. Second, Levenshtein distance (LD) measures the textual similarity among these functions. Sufficient similarity at function level is then lifted to the architectural level. The approach is evaluated by a case study for the Linux kernel. We give examples of instances of the grow-and-prune model for Linux. Then, we evaluate our technique quantitatively by measuring recall and precision with respect to an oracle. To obtain the oracle, we asked nine different developers to decide whether they believe certain functions are similar and should be merged. The evaluation shows that the recall and precision of our technique are about 75%. Calculating LD on token values rather than characters is superior. The two metrics strongly correlate but the token-based calculation reduces runtime by a factor of 4.6. Clone detection is an effective filter to reduce the number of calculations of the relatively expensive LD. Copyright © 2009 John Wiley & Sons, Ltd.