Constructing universal version history

Authors:
Hung-Fu Chang;Audris Mockus
Affiliations:
University of Southern California, Los Angeles, CA;Avaya Labs Research, Basking Ridge, NJ
Venue:
Proceedings of the 2006 international workshop on Mining software repositories
Year:
2006

Citing 7
Cited 3

CCFinder: a multilinguistic token-based code clone detection system for large scale source code

IEEE Transactions on Software Engineering
Software Quality Analysis by Code Clones in Industrial Legacy Software

METRICS '02 Proceedings of the 8th International Symposium on Software Metrics
On finding duplication and near-duplication in large software systems

WCRE '95 Proceedings of the Second Working Conference on Reverse Engineering
Assessing the Benefits of Incorporating Function Clone Detection in a Development Process

ICSM '97 Proceedings of the International Conference on Software Maintenance
Clone Detection Using Abstract Syntax Trees

ICSM '98 Proceedings of the International Conference on Software Maintenance
A Language Independent Approach for Detecting Duplicated Code

ICSM '99 Proceedings of the IEEE International Conference on Software Maintenance
Improved Tool Support for the Investigation of Duplication in Software

ICSM '05 Proceedings of the 21st IEEE International Conference on Software Maintenance

Large-Scale Code Reuse in Open Source Software

FLOSS '07 Proceedings of the First International Workshop on Emerging Trends in FLOSS Research and Development
Evaluation of source code copy detection methods on freebsd

Proceedings of the 2008 international working conference on Mining software repositories
Risky files: an approach to focus quality improvement effort

Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Developers often copy code for parts or entire products to start a new product or a new release. In order to understand the software change history and to determine the code authorship, we propose to construct a universal version history from multiple version control repositories. To that end we create two practical code copy detection methods at the level of the source code file: prefix-postfix algorithm and prefix algorithm. The full pathname of a file and its version history are used to construct the universal version history of a file by linking together change histories of files that had the same code at any point in the past. The assumption of both algorithms is that developers often duplicate files by copying entire directories. Once the copying is identified we propose an algorithm to link version histories from multiple repositories in order to construct universal version history. The results show that about 41.32% of source files (in the repository involving more than 6M versions of around 2M files) were duplicated among the Avaya's source code repositories for more than ten different projects. The prefix-postfix algorithm is more suitable than prefix algorithm due to the reasonable error rates after validation of the known copying behaviors.