An automated approach for abstracting execution logs to execution events
Journal of Software Maintenance and Evolution: Research and Practice - Special Issue on Program Comprehension through Dynamic Analysis (PCODA)
An information retrieval process to aid in the analysis of code clones
Empirical Software Engineering
An evaluation of code similarity identification for the grow-and-prune model
Journal of Software Maintenance and Evolution: Research and Practice - Special Issue on the 12th Conference on Software Maintenance and Reengineering (CSMR 2008)
Near-miss function clones in open source software: an empirical study
Journal of Software Maintenance and Evolution: Research and Practice - Working Conference on Reverse Engineering (WCRE 2008)
Extracting code clones for refactoring using combinations of clone metrics
Proceedings of the 5th International Workshop on Software Clones
Automated type-3 clone oracle using Levenshtein metric
Proceedings of the 5th International Workshop on Software Clones
Representing clones in a localized manner
Proceedings of the 5th International Workshop on Software Clones
Proceedings of the 5th International Workshop on Software Clones
Hi-index | 0.00 |
Clones are code segments that have been created by copying-and-pasting from other code segments. Clones occur often in large software systems. It is reported that 5 to 50% of the source code of a large software system is cloned. A major challenge when studying code cloning in large software systems is handling the large amount of clone candidates produced by leading edge clone detection tools. For example, the CCFinder, clone detection tool, produces over 7 million pairs of clone candidates for the Linux kernel (which consists of over 4 MLOC). Moreover, the output of clone detection tools grows rapidly as a software system evolves. Researchers and developers need tools to help them study the large amount of clone data in order to better understand the clone phenomena in large systems. In this paper, we propose a data mining framework to help researchers cope with the large amount of data produced by clone detection tools. We propose techniques to reduce, abstract and highlight the most interesting data produced by clone detection tools. Our framework also introduces a visualization tool which allows users to query and explore clone data at various abstraction levels. We demonstrate our framework on a case study of the clone phenomena in the Linux kernel.