Understanding search engines: mathematical modeling and text retrieval
Understanding search engines: mathematical modeling and text retrieval
CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
Software Quality Analysis by Code Clones in Industrial Legacy Software
METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
On finding duplication and near-duplication in large software systems
WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Assessing the Benefits of Incorporating Function Clone Detection in a Development Process
ICSM '97 Proceedings of the International Conference on Software Maintenance
Clone Detection Using Abstract Syntax Trees
ICSM '98 Proceedings of the International Conference on Software Maintenance
A Language Independent Approach for Detecting Duplicated Code
ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Using Origin Analysis to Detect Merging and Splitting of Source Code Entities
IEEE Transactions on Software Engineering
Improved Tool Support for the Investigation of Duplication in Software
ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Constructing universal version history
Proceedings of the 2006 international workshop on Mining software repositories
ICSEW '07 Proceedings of the 29th International Conference on Software Engineering Workshops
Large-Scale Code Reuse in Open Source Software
FLOSS '07 Proceedings of the First International Workshop on Emerging Trends in FLOSS Research and Development
Comparison and Evaluation of Clone Detection Tools
IEEE Transactions on Software Engineering
Code Reuse in Open Source Software
Management Science
Assessing the state of software in a large enterprise
Empirical Software Engineering
An empirical investigation on the reusability of design patterns and software packages
Journal of Systems and Software
Hi-index | 0.00 |
Studies have shown that substantial code reuse is common in open source and in commercial projects. However, the precise extent of reuse and its impact on productivity and quality are not well investigated in the open source context. Previously, we have introduced a simple-to-use method that needs only a set of file pathnames to identifies directories that share filenames and partially validated its performance on a set of closed-source projects. To evaluate this method and to improve reuse detection at the file level, we apply it and four additional file copy detection methods that utilize the underlying content of multiple versions of the source code on the FreeBSD project. The evaluation quantified unique advantages of each method and showed that the filename method detected roughly half of all reuse cases. We are still faced with a challenge to scale the content based methods to large repositories containing all versions of open source files.