Instant code clone search

  • Authors:
  • Mu-Woong Lee;Jong-Won Roh;Seung-won Hwang;Sunghun Kim

  • Affiliations:
  • Pohang University of Science and Technology (POSTECH), Pohang, South Korea;Pohang University of Science and Technology (POSTECH), Pohang, South Korea;Pohang University of Science and Technology (POSTECH), Pohang, South Korea;Hong Kong University of Science and Technology (HKUST), Hong Kong, Hong Kong

  • Venue:
  • Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we propose a scalable instant code clone search engine for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a "post-mortem" approach involving the detection of clones "after" the code development is completed, and hence, fail to return the results instantly. In clear contrast, we combine the strength of these two lines of existing research, by supporting instant code clone detection. To achieve this goal, we propose scalable indexing structures on vector abstractions of code. Our proposed algorithms allow developers to detect clones of a given code segment among the 1.7 million code segments from 492 open source projects in sub-second response times, without compromising the accuracy obtained by a state-of-the-art tool.