Rules and Tools for Software Evolution Planning and Management
Annals of Software Engineering
Evolution in Open Source Software: A Case Study
ICSM '00 Proceedings of the International Conference on Software Maintenance (ICSM'00)
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Obfuscation of executable code to improve resistance to static disassembly
Proceedings of the 10th ACM conference on Computer and communications security
An empirical study of code clone genealogies
Proceedings of the 10th European software engineering conference held jointly with 13th ACM SIGSOFT international symposium on Foundations of software engineering
Finding diversity in remote code injection exploits
Proceedings of the 6th ACM SIGCOMM conference on Internet measurement
PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware
ACSAC '06 Proceedings of the 22nd Annual Computer Security Applications Conference
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
Static disassembly of obfuscated binaries
SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
IEEE Transactions on Software Engineering
Renovo: a hidden code extractor for packed executables
Proceedings of the 2007 ACM workshop on Recurring malcode
A Study of the Packer Problem and Its Solutions
RAID '08 Proceedings of the 11th international symposium on Recent Advances in Intrusion Detection
Feature hashing for large scale multitask learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Detecting code clones in binary executables
Proceedings of the eighteenth international symposium on Software testing and analysis
Large-scale malware indexing using function-call graphs
Proceedings of the 16th ACM conference on Computer and communications security
A model of large program development
IBM Systems Journal
An empirical study of malware evolution
COMSNETS'09 Proceedings of the First international conference on COMmunication Systems And NETworks
Lowest common ancestors in trees and directed acyclic graphs
Journal of Algorithms
Automated classification and analysis of internet malware
RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
Automatic malware categorization using cluster ensemble
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors
SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
After-life vulnerabilities: a study on firefox evolution, its vulnerabilities, and fixes
ESSoS'11 Proceedings of the Third international conference on Engineering secure software and systems
Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security
Software bertillonage: finding the provenance of an entity
Proceedings of the 8th Working Conference on Mining Software Repositories
Recovering the toolchain provenance of binary code
Proceedings of the 2011 International Symposium on Software Testing and Analysis
Automatic analysis of malware behavior using machine learning
Journal of Computer Security
Experimental challenges in cyber security: a story of provenance and lineage for malware
CSET'11 Proceedings of the 4th conference on Cyber security experimentation and test
BitShred: feature hashing malware for scalable triage and semantic analysis
Proceedings of the 18th ACM conference on Computer and communications security
SYSSEC '11 Proceedings of the 2011 First SysSec Workshop
Malware: The geneology of malware
Network Security
ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions
SP '12 Proceedings of the 2012 IEEE Symposium on Security and Privacy
Prudent Practices for Designing Malware Experiments: Status Quo and Outlook
SP '12 Proceedings of the 2012 IEEE Symposium on Security and Privacy
The effect of branching strategies on software quality
Proceedings of the ACM-IEEE international symposium on Empirical software engineering and measurement
An historical examination of open source releases and their vulnerabilities
Proceedings of the 2012 ACM conference on Computer and communications security
Lines of malicious code: insights into the malicious software industry
Proceedings of the 28th Annual Computer Security Applications Conference
Rendezvous: a search engine for binary code
Proceedings of the 10th Working Conference on Mining Software Repositories
Hi-index | 0.00 |
Software lineage refers to the evolutionary relationship among a collection of software. The goal of software lineage inference is to recover the lineage given a set of program binaries. Software lineage can provide extremely useful information in many security scenarios such as malware triage and software vulnerability tracking. In this paper, we systematically study software lineage inference by exploring four fundamental questions not addressed by prior work. First, how do we automatically infer software lineage from program binaries? Second, how do we measure the quality of lineage inference algorithms? Third, how useful are existing approaches to binary similarity analysis for inferring lineage in reality, and how about in an idealized setting? Fourth, what are the limitations that any software lineage inference algorithm must cope with? Towards these goals we build ILINE, a system for automatic software lineage inference of program binaries, and also IEVAL, a system for scientific assessment of lineage quality. We evaluated ILINE on two types of lineage-- straight line and directed acyclic graph--with large-scale real-world programs: 1,777 goodware spanning over a combined 110 years of development history and 114 malware with known lineage collected by the DARPA Cyber Genome program. We used IEVAL to study seven metrics to assess the diverse properties of lineage. Our results reveal that partial order mismatches and graph arc edit distance often yield the most meaningful comparisons in our experiments. Even without assuming any prior information about the data sets, ILINE proved to be effective in lineage inference--it achieves a mean accuracy of over 84% for goodware and over 72% for malware in our data sets.