Decompilation of binary programs
Software—Practice & Experience
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Extracting safe and precise control flow from binaries
RTCSA '00 Proceedings of the Seventh International Conference on Real-Time Systems and Applications
Recovery of Jump Table Case Statements from Binary Code
IWPC '99 Proceedings of the 7th International Workshop on Program Comprehension
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
Static disassembly of obfuscated binaries
SSYM'04 Proceedings of the 13th conference on USENIX Security Symposium - Volume 13
Learning to analyze binary computer code
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Efficiently inducing features of conditional random fields
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Polymorphic worm detection using structural information of executables
RAID'05 Proceedings of the 8th international conference on Recent Advances in Intrusion Detection
Recovering the toolchain provenance of binary code
Proceedings of the 2011 International Symposium on Software Testing and Analysis
Supervised learning for provenance-similarity of binaries
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Labeling library functions in stripped binaries
Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools
Who wrote this code? identifying the authors of program binaries
ESORICS'11 Proceedings of the 16th European conference on Research in computer security
Tracking concept drift in malware families
Proceedings of the 5th ACM workshop on Security and artificial intelligence
Automatically mining program build information via signature matching
Proceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering
Binary-code obfuscations in prevalent packer tools
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
We present a novel technique that identifies the source compiler of program binaries, an important element of program provenance. Program provenance answers fundamental questions of malware analysis and software forensics, such as whether programs are generated by similar tool chains; it also can allow development of debugging, performance analysis, and instrumentation tools specific to particular compilers. We formulate compiler identification as a structured learning problem, automatically building models to recognize sequences of binary code generated by particular compilers. We evaluate our techniques on a large set of real-world test binaries, showing that our models identify the source compiler of binary code with over 90% accuracy, even in the presence of interleaved code from multiple compilers. A case study demonstrates the use of inferred compiler provenance to augment stripped binary parsing, reducing parsing errors by 18%.