Phoenix-based clone detection using suffix trees

Authors:
Robert Tairas;Jeff Gray
Affiliations:
University of Alabama at Birmingham, Birmingham, AL;University of Alabama at Birmingham, Birmingham, AL
Venue:
Proceedings of the 44th annual Southeast regional conference
Year:
2006

Citing 5
Cited 8

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
N degrees of separation: multi-dimensional separation of concerns

Proceedings of the 21st international conference on Software engineering
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
On the Use of Clone Detection for Identifying Crosscutting Concern Code

IEEE Transactions on Software Engineering

Clone detection and refactoring

Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Science of Computer Programming
A Model Engineering Approach to Tool Interoperability

Software Language Engineering
Tree-pattern-based duplicate code detection

Proceedings of the ACM first international workshop on Data-intensive software management and mining
Representing clones in a localized manner

Proceedings of the 5th International Workshop on Software Clones
IDE-based real-time focused search for near-miss clones

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Resource requirement prediction using clone detection technique

Future Generation Computer Systems
Viewing functions as token sequences to highlight similarities in source code

Science of Computer Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

A code clone represents a sequence of statements that are duplicated in multiple locations of a program. Clones often arise in source code as a result of multiple cut/paste operations on the source, or due to the emergence of crosscutting concerns. Programs containing code clones can manifest problems during the maintenance phase. When a fault is found or an update is needed on the original copy of a code section, all similar clones must also be found so that they can be fixed or updated accordingly. The ability to detect clones becomes a necessity when performing maintenance tasks. However, if done manually, clone detection can be a slow and tedious activity that is also error prone. A tool that can automatically detect clones offers a significant advantage during software evolution. With such an automated detection tool, clones can be found and updated in less time. Moreover, restructuring or refactoring of these clones can yield better performance and modularity in the program.This paper describes an investigation into an automatic clone detection technique developed as a plug-in for Microsoft's new Phoenix framework. Our investigation finds function-level clones in a program using abstract syntax trees (ASTs) and suffix trees. An AST provides the structural representation of the code after the lexical analysis process. The AST nodes are used to generate a suffix tree, which allows analysis on the nodes to be performed rapidly. We use the same methods that have been successfully applied to find duplicate sections in biological sequences to search for matches on the suffix tree that is generated, which in turn reveal matches in the code.