Simple fast algorithms for the editing distance between trees and related problems
SIAM Journal on Computing
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
CIKM '93 Proceedings of the second international conference on Information and knowledge management
Optimal multi-step k-nearest neighbor search
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Distance browsing in spatial databases
ACM Transactions on Database Systems (TODS)
Information Retrieval
Feature Selection for Knowledge Discovery and Data Mining
Feature Selection for Knowledge Discovery and Data Mining
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
IEEE Transactions on Knowledge and Data Engineering
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations
EDBT '98 Proceedings of the 6th International Conference on Extending Database Technology: Advances in Database Technology
STR: A Simple and Efficient Algorithm for R-Tree Packing
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Similarity Search in High Dimensions via Hashing
VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Fast Time Sequence Indexing for Arbitrary Lp Norms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
An Evaluation of Generic Bulk Loading Techniques
Proceedings of the 27th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
Clone Detection in Source Code by Frequent Itemset Techniques
SCAM '04 Proceedings of the Source Code Analysis and Manipulation, Fourth IEEE International Workshop
The k-Nearest Neighbour Join: Turbo Charging the KDD Process
Knowledge and Information Systems
Using structural context to recommend source code examples
Proceedings of the 27th international conference on Software engineering
An empirical study of code clone genealogies
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Sourcerer: a search engine for open source code supporting structure-based search
Companion to the 21st ACM SIGPLAN symposium on Object-oriented programming systems, languages, and applications
Approximate Structural Context Matching: An Approach to Recommend Relevant Examples
IEEE Transactions on Software Engineering
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
ICSE '07 Proceedings of the 29th international conference on Software Engineering
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Operating System Concepts
Recommending Typical Usage Examples for Component Retrieval in Reuse Repositories
ICSR '08 Proceedings of the 10th international conference on Software Reuse: High Confidence Software Reuse in Large Systems
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
Science of Computer Programming
Adding Examples into Java Documents
ASE '09 Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering
Report on data-intensive software management and mining
ACM SIGMOD Record
IDE-based real-time focused search for near-miss clones
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Proceedings of the 34th International Conference on Software Engineering
XIAO: tuning code clones at hands of engineers in practice
Proceedings of the 28th Annual Computer Security Applications Conference
Hi-index | 0.00 |
In this paper, we propose a scalable instant code clone search engine for large-scale software repositories. While there are commercial code search engines available, they treat software as text and often fail to find semantically related code. Meanwhile, existing tools for semantic code clone searches take a "post-mortem" approach involving the detection of clones "after" the code development is completed, and hence, fail to return the results instantly. In clear contrast, we combine the strength of these two lines of existing research, by supporting instant code clone detection. To achieve this goal, we propose scalable indexing structures on vector abstractions of code. Our proposed algorithms allow developers to detect clones of a given code segment among the 1.7 million code segments from 492 open source projects in sub-second response times, without compromising the accuracy obtained by a state-of-the-art tool.