CodeBlast: a two-stage algorithm for improved program similarity matching in large software repositories

  • Authors:
  • Anupam Bhattacharjee;Hasan M. Jamil

  • Affiliations:
  • Wayne State University;University of Idaho

  • Venue:
  • Proceedings of the 28th Annual ACM Symposium on Applied Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present an improved and novel directed graph matching algorithm, called CodeBlast, for searching functionally similar program segments in software repositories with greater effectiveness and accuracy. CodeBlast uses a novel canonical labeling concept to capture order independent data flow pattern in a program to encode the programŠs functional semantics and to aid matching. CodeBlast is capable of exact and approximate directed graph matching and is particularly suitable for matching Program Dependence Graphs. Introducing the notion of semantic equivalence in CodeBlast helps discover clone matches with high precision and recall that was not possible using systems such as JPlag, MOSS, and GPlag. We substantiate our claim through sufficient experimental evidence and comparative analysis with these leading systems.