Instant code clone search

Authors:
Mu-Woong Lee;Jong-Won Roh;Seung-won Hwang;Sunghun Kim
Affiliations:
Pohang University of Science and Technology (POSTECH), Pohang, South Korea;Pohang University of Science and Technology (POSTECH), Pohang, South Korea;Pohang University of Science and Technology (POSTECH), Pohang, South Korea;Hong Kong University of Science and Technology (HKUST), Hong Kong, Hong Kong
Venue:
Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering
Year:
2010

Citing 28
Cited 4

Simple fast algorithms for the editing distance between trees and related problems

SIAM Journal on Computing
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Distance browsing in spatial databases

ACM Transactions on Database Systems (TODS)
Information Retrieval

Information Retrieval
Feature Selection for Knowledge Discovery and Data Mining

Feature Selection for Knowledge Discovery and Data Mining
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'

IEEE Transactions on Knowledge and Data Engineering
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations

EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
STR: A Simple and Efficient Algorithm for R-Tree Packing

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Similarity Search in High Dimensions via Hashing

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An Evaluation of Generic Bulk Loading Techniques

Proceedings of the 27th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
Clone Detection in Source Code by Frequent Itemset Techniques

SCAM '04 Proceedings of the Source Code Analysis and Manipulation, Fourth IEEE International Workshop
The k-Nearest Neighbour Join: Turbo Charging the KDD Process

Knowledge and Information Systems
Using structural context to recommend source code examples

Proceedings of the 27th international conference on Software engineering
An empirical study of code clone genealogies

Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Sourcerer: a search engine for open source code supporting structure-based search

Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications
Approximate Structural Context Matching: An Approach to Recommend Relevant Examples

IEEE Transactions on Software Engineering
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder

ICSE '07 Proceedings of the 29th international conference on Software Engineering
Operating System Concepts

Operating System Concepts
Recommending Typical Usage Examples for Component Retrieval in Reuse Repositories

ICSR '08 Proceedings of the 10th international conference on Software Reuse: High Confidence Software Reuse in Large Systems
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach

Science of Computer Programming
Adding Examples into Java Documents

ASE '09 Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering

Report on data-intensive software management and mining

ACM SIGMOD Record
IDE-based real-time focused search for near-miss clones

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Where does this code come from and where does it go? - integrated code history tracker for open source systems -

Proceedings of the 34th International Conference on Software Engineering
XIAO: tuning code clones at hands of engineers in practice

Proceedings of the 28th Annual Computer Security Applications Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a scalable instant code clone search engine for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a "post-mortem" approach involving the detection of clones "after" the code development is completed, and hence, fail to return the results instantly. In clear contrast, we combine the strength of these two lines of existing research, by supporting instant code clone detection. To achieve this goal, we propose scalable indexing structures on vector abstractions of code. Our proposed algorithms allow developers to detect clones of a given code segment among the 1.7 million code segments from 492 open source projects in sub-second response times, without compromising the accuracy obtained by a state-of-the-art tool.