Machine Learning
Winnowing: local algorithms for document fingerprinting
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Semantics-Aware Malware Detection
SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
Foundations and Trends in Information Retrieval
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
Information theoretic measures for clusterings comparison: is a correction for chance necessary?
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Distance Metric Learning for Large Margin Nearest Neighbor Classification
The Journal of Machine Learning Research
Extracting compiler provenance from program binaries
Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Recognizing authors: an examination of the consistent programmer hypothesis
Software Testing, Verification & Reliability
Recovering the toolchain provenance of binary code
Proceedings of the 2011 International Symposium on Software Testing and Analysis
Polymorphic worm detection using structural information of executables
RAID'05 Proceedings of the 8th international conference on Recent Advances in Intrusion Detection
Detecting machine-morphed malware variants via engine attribution
Journal in Computer Virology
Hi-index | 0.00 |
Program authorship attribution--identifying a programmer based on stylistic characteristics of code--has practical implications for detecting software theft, digital forensics, and malware analysis. Authorship attribution is challenging in these domains where usually only binary code is available; existing source code-based approaches to attribution have left unclear whether and to what extent programmer style survives the compilation process. Casting authorship attribution as a machine learning problem, we present a novel program representation and techniques that automatically detect the stylistic features of binary code. We apply these techniques to two attribution problems: identifying the precise author of a program, and finding stylistic similarities between programs by unknown authors. Our experiments provide strong evidence that programmer style is preserved in program binaries.