Identification of High-Level Concept Clones in Source Code

  • Authors:
  • Andrian Marcus;Jonathan I. Maletic

  • Affiliations:
  • -;-

  • Venue:
  • Proceedings of the 16th IEEE international conference on Automated software engineering
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Source code duplication occurs frequently within largesoftware systems. Pieces of source code, functions, anddata types are often duplicated in part, or in whole, for avariety of reasons. Programmers may simply be reusinga piece of code via copy and paste or they may be "re-inventingthe wheel".Previous research on the detection of clones is mainlyfocused on identifying pieces of code with similar (ornearly similar) structure. Our approach is to examine thesource code text (comments and identifiers) and identifyimplementations of similar high-level concepts (e.g.,abstract data types). The approach uses an informationretrieval technique (i.e., latent semantic indexing) tostatically analyze the software system and determinesemantic similarities between source code documents(i.e., functions, files, or code segments). These similaritymeasures are used to drive the clone detection process.The intention of our approach is to enhance andaugment existing clone detection methods that are basedon structural analysis. This synergistic use of methodswill improve the quality of clone detection. A set ofexperiments is presented that demonstrate the usage ofsemantic similarity measure to identify clones within aversion of NCSA Mosaic.