Clone detection via structural abstraction

  • Authors:
  • William S. Evans;Christopher W. Fraser;Fei Ma

  • Affiliations:
  • Department of Computer Science, University of British Columbia, Vancouver, Canada V6T1Z4;, Seattle, USA;Microsoft, One Microsoft Way, Redmond, USA 98052

  • Venue:
  • Software Quality Control
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes the design, implementation, and application of a new algorithm to detect cloned code. It operates on the abstract syntax trees formed by many compilers as an intermediate representation. It extends prior work by identifying clones even when arbitrary subtrees have been changed. These subtrees may represent structural rather than simply lexical code differences. In several hundred thousand lines of Java and C# code, 20---50% of the clones that we find involve these structural changes, which are not accounted for by previous methods. Our method also identifies cloning in declarations, so it is somewhat more general than conventional procedural abstraction.