CCFinder: a multilinguistic token-based code clone detection system for large scale source code
IEEE Transactions on Software Engineering
Using Origin Analysis to Detect Merging and Splitting of Source Code Entities
IEEE Transactions on Software Engineering
An empirical study of code clone genealogies
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Approximate Structural Context Matching: An Approach to Recommend Relevant Examples
IEEE Transactions on Software Engineering
ICSE '07 Proceedings of the 29th international conference on Software Engineering
Evaluating the Harmfulness of Cloning: A Change Based Experiment
MSR '07 Proceedings of the Fourth International Workshop on Mining Software Repositories
"Cloning considered harmful" considered harmful: patterns of cloning in software
Empirical Software Engineering
Code siblings: Technical and legal implications of copying code between applications
MSR '09 Proceedings of the 2009 6th IEEE International Working Conference on Mining Software Repositories
An empirical study on the maintenance of source code clones
Empirical Software Engineering
Determining the provenance of software artifacts
Proceedings of the 5th International Workshop on Software Clones
Measuring subversions: security and legal risk in reused software artifacts
Proceedings of the 33rd International Conference on Software Engineering
Cross-library API recommendation using web search engines
Proceedings of the 19th ACM SIGSOFT symposium and the 13th European conference on Foundations of software engineering
Proceedings of the 34th International Conference on Software Engineering
Detecting similar software applications
Proceedings of the 34th International Conference on Software Engineering
Hot clones: combining search-driven development, clone management, and code provenance
Proceedings of the 34th International Conference on Software Engineering
The MSR cookbook: mining a decade of research
Proceedings of the 10th Working Conference on Mining Software Repositories
API change and fault proneness: a threat to the success of Android apps
Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering
Towards automatic software lineage inference
SEC'13 Proceedings of the 22nd USENIX conference on Security
Empirical Software Engineering
Hi-index | 0.00 |
Deployed software systems are typically composed of many pieces, not all of which may have been created by the main development team. Often, the provenance of included components -- such as external libraries or cloned source code -- is not clearly stated, and this uncertainty can introduce technical and ethical concerns that make it difficult for system owners and other stakeholders to manage their software assets. In this work, we motivate the need for the recovery of the provenance of software entities by a broad set of techniques that could include signature matching, source code fact extraction, software clone detection, call flow graph matching, string matching, historical analyses, and other techniques. We liken our provenance goals to that of Bertillonage, a simple and approximate forensic analysis technique based on bio-metrics that was developed in 19th century France before the advent of fingerprints. As an example, we have developed a fast, simple, and approximate technique called anchored signature matching for identifying library version information within a given Java application. This technique involves a type of structured signature matching performed against a database of candidates drawn from the Maven2 repository, a 150GB collection of open source Java libraries. An exploratory case study using a proprietary e-commerce Java application illustrates that the approach is both feasible and effective.