CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
Using Origin Analysis to Detect Merging and Splitting of Source Code Entities
IEEE Transactions on Software Engineering
Mylar: a degree-of-interest model for IDEs
Proceedings of the 4th international conference on Aspect-oriented software development
Hipikat: A Project Memory for Software Development
IEEE Transactions on Software Engineering
An empirical study of code clone genealogies
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Approximate Structural Context Matching: An Approach to Recommend Relevant Examples
IEEE Transactions on Software Engineering
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Evaluating the Harmfulness of Cloning: A Change Based Experiment
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
"Cloning considered harmful" considered harmful: patterns of cloning in software
Empirical Software Engineering
Code siblings: Technical and legal implications of copying code between applications
MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
An empirical study on the maintenance of source code clones
Empirical Software Engineering
Customized awareness: recommending relevant external change events
Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 1
Recommendation Systems for Software Engineering
IEEE Software
Finding software license violations through binary code clone detection
Proceedings of the 8th Working Conference on Mining Software Repositories
Software bertillonage: finding the provenance of an entity
Proceedings of the 8th Working Conference on Mining Software Repositories
Measuring subversions: security and legal risk in reused software artifacts
Proceedings of the 33rd International Conference on Software Engineering
File cloning in open source Java projects: The good, the bad, and the ugly
ICSM '11 Proceedings of the 2011 27th IEEE International Conference on Software Maintenance
Hi-index | 0.00 |
Deployed software systems are typically composed of many pieces, not all of which may have been created by the main development team. Often, the provenance of included components--such as external libraries or cloned source code--is not clearly stated, and this uncertainty can introduce technical and ethical concerns that make it difficult for system owners and other stakeholders to manage their software assets. In this work, we motivate the need for the recovery of the provenance of software entities by a broad set of techniques that could include signature matching, source code fact extraction, software clone detection, call flow graph matching, string matching, historical analyses, and other techniques. We liken our provenance goals to that of Bertillonage, a simple and approximate forensic analysis technique based on bio-metrics that was developed in 19th century France before the advent of fingerprints. As an example, we have developed a fast, simple, and approximate technique called anchored signature matching for identifying the source origin of binary libraries within a given Java application. This technique involves a type of structured signature matching performed against a database of candidates drawn from the Maven2 repository, a 275 GB collection of open source Java libraries. To show the approach is both valid and effective, we conducted an empirical study on 945 jars from the Debian GNU/Linux distribution, as well as an industrial case study on 81 jars from an e-commerce application.