Phoenix-based clone detection using suffix trees

  • Authors:
  • Robert Tairas;Jeff Gray

  • Affiliations:
  • University of Alabama at Birmingham, Birmingham, AL;University of Alabama at Birmingham, Birmingham, AL

  • Venue:
  • Proceedings of the 44th annual Southeast regional conference
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

A code clone represents a sequence of statements that are duplicated in multiple locations of a program. Clones often arise in source code as a result of multiple cut/paste operations on the source, or due to the emergence of crosscutting concerns. Programs containing code clones can manifest problems during the maintenance phase. When a fault is found or an update is needed on the original copy of a code section, all similar clones must also be found so that they can be fixed or updated accordingly. The ability to detect clones becomes a necessity when performing maintenance tasks. However, if done manually, clone detection can be a slow and tedious activity that is also error prone. A tool that can automatically detect clones offers a significant advantage during software evolution. With such an automated detection tool, clones can be found and updated in less time. Moreover, restructuring or refactoring of these clones can yield better performance and modularity in the program.This paper describes an investigation into an automatic clone detection technique developed as a plug-in for Microsoft's new Phoenix framework. Our investigation finds function-level clones in a program using abstract syntax trees (ASTs) and suffix trees. An AST provides the structural representation of the code after the lexical analysis process. The AST nodes are used to generate a suffix tree, which allows analysis on the nodes to be performed rapidly. We use the same methods that have been successfully applied to find duplicate sections in biological sequences to search for matches on the suffix tree that is generated, which in turn reveal matches in the code.