Pattern mining of cloned codes in software systems

  • Authors:
  • Wei Qu;Yuanyuan Jia;Michael Jiang

  • Affiliations:
  • Graduate University of Chinese Academy of Sciences, 80 East Zhongguancun Road, Haidian, Beijing 100190, PR China;Bioengineering Department, University of Illinois, Chicago, IL 60607, USA;Motorola Labs, Motorola Inc., Schaumburg, IL 60196, USA

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2014

Quantified Score

Hi-index 0.07

Visualization

Abstract

Pattern mining of cloned codes in software systems is a challenging task due to various modifications and the large size of software codes. Most existing approaches adopt a token-based software representation and use sequential analysis for pattern mining of cloned codes. Due to the intrinsic limitations of such spatial space analysis, these methods have difficulties handling statement reordering, insertion and control replacement. Recently, graph-based models such as program dependent graph have been exploited to solve these issues. Although they can improve the performance in terms of accuracy, they introduce additional problems. Their computational complexity is very high and dramatically increases with the software size, thus limiting their applications in practice. In this paper, we propose a novel pattern mining framework for cloned codes in software systems. It efficiently exploits software's spatial space information as well as graph space information and thus can mine accurate patterns of cloned codes for software systems. Preliminary experimental results have demonstrated the superior performance of the proposed approach compared with other methods.