Scalable clone detection using description logic

  • Authors:
  • Philipp Schugerl

  • Affiliations:
  • Concordia University, Montreal, PQ, Canada

  • Venue:
  • Proceedings of the 5th International Workshop on Software Clones
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The semantic web is slowly transforming the web as we know it into a machine understandable pool of information that can be consumed and reasoned about by various clients. Source-code is no exception to this trend and various communities have proposed standards to share code as linked data. With the availability of large amounts of open source code published in publically accessible repositories and the introduction of massively horizontally scaling frameworks and cloud computing infrastructure, a new era of software mining across information silos is reshaping the software engineering landscape. The so far unreachable goal of analyzing code at a global level, and therefore detecting global software clones, has become manageable. Description logic and semantic web reasoners have so far only plaid a minor role in this transformation and are mainly used to model source code data. In this paper, we introduce a clone detection algorithm that uses a semantic web reasoner and is based on the Hadoop map-reduce framework that can scale horizontally to a large amount of data. We also define a novel and compact clone model that only considers control-blocks and used data types while still yielding similar clone detection results than more complex representations. In order to validate our approach we have compared our algorithm to some of the leading clone detection tools (CCFinder, JCD and Simian) and show differences in performance and detection precision.