N-Gram-Based Detection of New Malicious Code
COMPSAC '04 Proceedings of the 28th Annual International Computer Software and Applications Conference - Workshops and Fast Abstracts - Volume 02
Toward Automated Dynamic Malware Analysis Using CWSandbox
IEEE Security and Privacy
Multi-resolution similarity hashing
Digital Investigation: The International Journal of Digital Forensics & Incident Response
Proceedings of the 4th ACM workshop on Security and artificial intelligence
FORECAST: skimming off the malware cream
Proceedings of the 27th Annual Computer Security Applications Conference
Banksafe information stealer detection inside the web browser
RAID'11 Proceedings of the 14th international conference on Recent Advances in Intrusion Detection
A static, packer-agnostic filter to detect similar malware samples
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
DUET: integration of dynamic and static analyses for malware clustering with cluster ensembles
Proceedings of the 29th Annual Computer Security Applications Conference
SigMal: a static signal processing based malware triage
Proceedings of the 29th Annual Computer Security Applications Conference
Simseer and bugwise: web services for binary-level software similarity and defect detection
AusPDC '13 Proceedings of the Eleventh Australasian Symposium on Parallel and Distributed Computing - Volume 140
MutantX-S: scalable malware clustering based on static features
USENIX ATC'13 Proceedings of the 2013 USENIX conference on Annual Technical Conference
Hi-index | 0.00 |
Data collection is not a big issue anymore with available honeypot software and setups. However malware collections gathered from these honeypot systems often suffer from massive sample counts, data analysis systems like sandboxes cannot cope with. Sophisticated self-modifying malware is able to generate new polymorphic instances of itself with different message digest sums for each infection attempt, thus resulting in many different samples stored for the same specimen. Scaling analysis systems that are fed by databases that rely on sample uniqueness based on message digests is only feasible to a certain extent. In this paper we introduce a non cryptographic, fast to calculate hash function for binaries in the Portable Executable format that transforms structural information about a sample into a hash value. Grouping binaries by hash values calculated with the new function allows for detection of multiple instances of the same polymorphic specimen as well as samples that are broken e.g. due to transfer errors. Practical evaluation on different malware sets shows that the new function allows for a significant reduction of sample counts.