Decompilation of binary programs
Software—Practice & Experience
Machine Learning
ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Extracting safe and precise control flow from binaries
RTCSA '00 Proceedings of the Seventh International Conference on Real-Time Systems and Applications
Semantics-Aware Malware Detection
SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
Modeling interactome: scale-free or geometric?
Bioinformatics
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
Learning and Classification of Malware Behavior
DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Source code and binary analysis of software defects
Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies
Detecting code clones in binary executables
Proceedings of the eighteenth international symposium on Software testing and analysis
Learning to analyze binary computer code
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
PE-Miner: Mining Structural Information to Detect Malicious Executables in Realtime
RAID '09 Proceedings of the 12th International Symposium on Recent Advances in Intrusion Detection
Extracting compiler provenance from program binaries
Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Monitoring, analysis, and testing of deployed software
Proceedings of the FSE/SDP workshop on Future of software engineering research
Recognizing authors: an examination of the consistent programmer hypothesis
Software Testing, Verification & Reliability
Detecting self-mutating malware using control-flow graph matching
DIMVA'06 Proceedings of the Third international conference on Detection of Intrusions and Malware & Vulnerability Assessment
Efficiently inducing features of conditional random fields
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Polymorphic worm detection using structural information of executables
RAID'05 Proceedings of the 8th international conference on Recent Advances in Intrusion Detection
Who wrote this code? identifying the authors of program binaries
ESORICS'11 Proceedings of the 16th European conference on Research in computer security
Towards automatic software lineage inference
SEC'13 Proceedings of the 22nd USENIX conference on Security
Hi-index | 0.00 |
Program binaries are an artifact of a production process that begins with source code and ends with a string of bytes representing executable code. There are many reasons to want to know the specifics of this process for a given binary---for forensic investigation of malware, to diagnose the role of the compiler in crashes or performance problems, or for reverse engineering and decompilation---but binaries are not generally annotated with such provenance details. Intuitively, the binary code should exhibit properties specific to the process that produced it, but it is not at all clear how to find such properties and map them to specific elements of that process. In this paper, we present an automatic technique to recover toolchain provenance: those details, such as the source language and the compiler and compilation options, that define the transformation process through which the binary was produced. We approach provenance recovery as a classification problem, discovering characteristics of binary code that are strongly associated with particular toolchain components and developing models that can infer the likely provenance of program binaries. Our experiments show that toolchain provenance can be recovered with high accuracy, approaching 100% accuracy for some components and yielding good results (90%) even when the binaries emitted by different components appear to be very similar.