Empirical evaluation of clone detection using syntax suffix trees

  • Authors:
  • Raimar Falke;Pierre Frenzel;Rainer Koschke

  • Affiliations:
  • University of Bremen, Bremen, Germany;University of Bremen, Bremen, Germany;University of Bremen, Bremen, Germany

  • Venue:
  • Empirical Software Engineering
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Reusing software through copying and pasting is a continuous plague in software development despite the fact that it creates serious maintenance problems. Various techniques have been proposed to find duplicated redundant code (also known as software clones). A recent study has compared these techniques and shown that token-based clone detection based on suffix trees is fast but yields clone candidates that are often not syntactic units. Current techniques based on abstract syntax trees--on the other hand--find syntactic clones but are considerably less efficient. This paper describes how we can make use of suffix trees to find syntactic clones in abstract syntax trees. This new approach is able to find syntactic clones in linear time and space. The paper reports the results of a large case study in which we empirically compare the new technique to other techniques using the Bellon benchmark for clone detectors. The Bellon benchmark consists of clone pairs validated by humans for eight software systems written in C or Java from different application domains. The new contributions of this paper over the conference paper are the additional analysis of Java programs, the exploration of an alternative path that uses parse trees instead of abstract syntax trees, and the investigation of the impact on recall and precision when clone analyses insist on consistent parameter renaming.