Machine Learning
BinHunt: Automatically Finding Semantic Differences in Binary Programs
ICICS '08 Proceedings of the 10th International Conference on Information and Communications Security
Computing the behavior of malicious code with function extraction technology
Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies
Detecting code clones in binary executables
Proceedings of the eighteenth international symposium on Software testing and analysis
Large-scale malware indexing using function-call graphs
Proceedings of the 16th ACM conference on Computer and communications security
A static birthmark of binary executables based on API call structure
ASIAN'07 Proceedings of the 12th Asian computing science conference on Advances in computer science: computer and network security
Extracting compiler provenance from program binaries
Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering
Automatic malware categorization using cluster ensemble
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards semantic comparison of multi-granularity process traces
Knowledge-Based Systems
Hi-index | 0.00 |
Understanding, measuring, and leveraging the similarity of binaries (executable code) is a foundational challenge in software engineering. We present a notion of similarity based on provenance -- two binaries are similar if they are compiled from the same (or very similar) source code with the same (or similar) compilers. Empirical evidence suggests that provenance-similarity accounts for a significant portion of variation in existing binaries, particularly in malware. We propose and evaluate the applicability of classification to detect provenance-similarity. We evaluate a variety of classifiers, and different types of attributes and similarity labeling schemes, on two benchmarks derived from open-source software and malware respectively. We present encouraging results indicating that classification is a viable approach for automated provenance-similarity detection, and as an aid for malware analysts in particular.