The story of moose: an agile reengineering environment
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Proceedings of the 2007 ACM symposium on Document engineering
Structural analysis and visualization of C++ code evolution using syntax trees
Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting
Comparison and Evaluation of Clone Detection Tools
IEEE Transactions on Software Engineering
Refactoring of Crosscutting Concerns with Metaphor-Based Heuristics
Electronic Notes in Theoretical Computer Science (ENTCS)
Comparison and evaluation of code clone detection techniques and tools: A qualitative approach
Science of Computer Programming
Automated type-3 clone oracle using Levenshtein metric
Proceedings of the 5th International Workshop on Software Clones
Finding software license violations through binary code clone detection
Proceedings of the 8th Working Conference on Mining Software Repositories
Code flows: visualizing structural evolution of source code
EuroVis'08 Proceedings of the 10th Joint Eurographics / IEEE - VGTC conference on Visualization
Resource requirement prediction using clone detection technique
Future Generation Computer Systems
Detecting source code similarity using code abstraction
Proceedings of the 7th International Conference on Ubiquitous Information Management and Communication
Uncovering access control weaknesses and flaws with security-discordant software clones
Proceedings of the 29th Annual Computer Security Applications Conference
Hi-index | 0.00 |
Although duplicated code is known to pose severe problems for software maintenance, it is difficult to identify in large systems. Many different techniques have been developed to detect software clones, some of which are very sophisticated, but are also expensive to implement and adapt. Lightweight techniques based on simple string matching are easy to implement, but how effective are they? We present a simple string-based approach which we have successfully applied to a number of different languages such COBOL, JAVA, C++, PASCAL, PYTHON, SMALLTALK, C and PDP-11 ASSEMBLER. In each case the maximum time to adapt the approach to a new language was less than 45 minutes. In this paper we investigate a number of simple variants of string-based clone detection that normalize differences due to common editing operations, and assess the quality of clone detection for very different case studies. Our results confirm that this inexpensive clone detection technique generally achieves high recall and acceptable precision. Over-zealous normalization of the code before comparison, however, can result in an unacceptable numbers of false positives. Copyright © 2005 John Wiley & Sons, Ltd.