Evaluation of source code copy detection methods on freebsd

Authors:
Hung-Fu Chang;Audris Mockus
Affiliations:
University of Southern California, Los Angeles, CA, USA;Avaya Labs Research, Basking Ridge, NJ, USA
Venue:
Proceedings of the 2008 international working conference on Mining software repositories
Year:
2008

Citing 14
Cited 2

Understanding search engines: mathematical modeling and text retrieval

Understanding search engines: mathematical modeling and text retrieval
CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Software Quality Analysis by Code Clones in Industrial Legacy Software

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Assessing the Benefits of Incorporating Function Clone Detection in a Development Process

ICSM '97 Proceedings of the International Conference on Software Maintenance
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
A Language Independent Approach for Detecting Duplicated Code

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Using Origin Analysis to Detect Merging and Splitting of Source Code Entities

IEEE Transactions on Software Engineering
Improved Tool Support for the Investigation of Duplication in Software

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance
Constructing universal version history

Proceedings of the 2006 international workshop on Mining software repositories
Using software distributions to understand the relationship among free and open source software projects

ICSEW '07 Proceedings of the 29th International Conference on Software Engineering Workshops
Large-Scale Code Reuse in Open Source Software

FLOSS '07 Proceedings of the First International Workshop on Emerging Trends in FLOSS Research and Development
Comparison and Evaluation of Clone Detection Tools

IEEE Transactions on Software Engineering
Code Reuse in Open Source Software

Management Science

Assessing the state of software in a large enterprise

Empirical Software Engineering
An empirical investigation on the reusability of design patterns and software packages

Journal of Systems and Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

Studies have shown that substantial code reuse is common in open source and in commercial projects. However, the precise extent of reuse and its impact on productivity and quality are not well investigated in the open source context. Previously, we have introduced a simple-to-use method that needs only a set of file pathnames to identifies directories that share filenames and partially validated its performance on a set of closed-source projects. To evaluate this method and to improve reuse detection at the file level, we apply it and four additional file copy detection methods that utilize the underlying content of multiple versions of the source code on the FreeBSD project. The evaluation quantified unique advantages of each method and showed that the filename method detected roughly half of all reuse cases. We are still faced with a challenge to scale the content based methods to large repositories containing all versions of open source files.