Accurate and Efficient Structural Characteristic Feature Extraction for Clone Detection

  • Authors:
  • Hoan Anh Nguyen;Tung Thanh Nguyen;Nam H. Pham;Jafar M. Al-Kofahi;Tien N. Nguyen

  • Affiliations:
  • Electrical and Computer Engineering Department, Iowa State University, USA;Electrical and Computer Engineering Department, Iowa State University, USA;Electrical and Computer Engineering Department, Iowa State University, USA;Electrical and Computer Engineering Department, Iowa State University, USA;Electrical and Computer Engineering Department, Iowa State University, USA

  • Venue:
  • FASE '09 Proceedings of the 12th International Conference on Fundamental Approaches to Software Engineering: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Structure-oriented approaches in clone detection have become popular in both code-based and model-based clone detection. However, existing methods for capturing structural information in software artifacts are either too computationally expensive to be efficient or too light-weight to be accurate in clone detection. In this paper, we present Exas, an accurate and efficient structural characteristic feature extraction approach that better approximates and captures the structure within the fragments of artifacts. Exas structural features are the sequences of labels and numbers built from nodes, edges, and paths of various lengths of a graph-based representation. A fragment is characterized by a structural characteristic vector of the occurrence counts of those features. We have applied Exas in building two clone detection tools for source code and models. Our analytic study and empirical evaluation on open-source software show that Exas and its algorithm for computing the characteristic vectors are highly accurate and efficient in clone detection.