Algorithms for the Longest Common Subsequence Problem
Journal of the ACM (JACM)
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics
ICSM '96 Proceedings of the 1996 International Conference on Software Maintenance
Parallel Support for Source Code Analysis and Modification
SCAM '02 Proceedings of the Second IEEE International Workshop on Source Code Analysis and Manipulation
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
Syntactic Approximation Using Iterative Lexical Analysis
IWPC '03 Proceedings of the 11th IEEE International Workshop on Program Comprehension
Comprehending Reality " Practical Barriers to Industrial Adoption of Software Maintenance Automation
IWPC '03 Proceedings of the 11th IEEE International Workshop on Program Comprehension
Practical language-independent detection of near-miss clones
CASCON '04 Proceedings of the 2004 conference of the Centre for Advanced Studies on Collaborative research
CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code
IEEE Transactions on Software Engineering
The TXL source transformation language
Science of Computer Programming - The fourth workshop on language descriptions, tools, and applications (LDTA'04)
Clone Detection Using Abstract Syntax Suffix Trees
WCRE '06 Proceedings of the 13th Working Conference on Reverse Engineering
Comparison and Evaluation of Clone Detection Tools
IEEE Transactions on Software Engineering
Scenario-Based Comparison of Clone Detection Techniques
ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
"Cloning considered harmful" considered harmful: patterns of cloning in software
Empirical Software Engineering
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
Science of Computer Programming
CSMR '09 Proceedings of the 2009 European Conference on Software Maintenance and Reengineering
A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools
ICSTW '09 Proceedings of the IEEE International Conference on Software Testing, Verification, and Validation Workshops
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
Code siblings: Technical and legal implications of copying code between applications
MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
Near-miss function clones in open source software: an empirical study
Journal of Software Maintenance and Evolution: Research and Practice - Working Conference on Reverse Engineering (WCRE 2008)
Are scripting languages really different?
Proceedings of the 4th International Workshop on Software Clones
Exploring Large-Scale System Similarity Using Incremental Clone Detection and Live Scatterplots
ICPC '11 Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension
DebCheck: Efficient Checking for Open Source Code Clones in Software Systems
ICPC '11 Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension
ICPC '11 Proceedings of the 2011 IEEE 19th International Conference on Program Comprehension
Guest editors' introduction to the 4th issue of Experimental Software and Toolkits (EST-4)
Science of Computer Programming
Using clone detection to find malware in acrobat files
CASCON '13 Proceedings of the 2013 Conference of the Center for Advanced Studies on Collaborative Research
Genealogical insights into the facts and fictions of clone removal
ACM SIGAPP Applied Computing Review
Hi-index | 0.00 |
Clone detection is a research technique for analyzing software systems for similarities, with applications in software understanding, maintenance, evolution, license enforcement and many other issues. The NiCad near-miss clone detection method has been shown to yield highly accurate results in both precision and recall. However, its naive two-step method, involving a parsing first step to identify and normalize code fragments, followed by a text line-based second step using longest common subsequence (LCS) to compare fragments, has proven difficult to migrate to the efficiency and scalability required for large scale research applications. Rather than presenting the NiCad tool itself in detail, this paper focuses on our experience in migrating NiCad from an initial rapid prototype to a practical scalable research tool. The process has increased overall performance by a factor of up to 40 and clone detection speed by a factor of over 400, while reducing memory and processor requirements to fit on a standard laptop. We apply a sequence of four different kinds of performance optimizations and analyze the effect of each optimization in detail. We believe that the lessons of our experience in migrating NiCad from research prototype to production performance may be beneficial to others who are facing a similar problem.