FORECAST: skimming off the malware cream

Authors:
Matthias Neugschwandtner;Paolo Milani Comparetti;Gregoire Jacob;Christopher Kruegel
Affiliations:
Vienna University of Technology;Vienna University of Technology;University of California, Santa Barbara;University of California, Santa Barbara
Venue:
Proceedings of the 27th Annual Computer Security Applications Conference
Year:
2011

Citing 21
Cited 4

Classification by pairwise coupling

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware

ACSAC '06 Proceedings of the 22nd Annual Computer Security Applications Conference
Generalized Bradley-Terry Models and Multi-Class Probability Estimates

The Journal of Machine Learning Research
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
Detecting Obfuscated Viruses Using Cosine Similarity Analysis

AMS '07 Proceedings of the First Asia International Conference on Modelling & Simulation
Bro: a system for detecting network intruders in real-time

SSYM'98 Proceedings of the 7th conference on USENIX Security Symposium - Volume 7
Renovo: a hidden code extractor for packed executables

Proceedings of the 2007 ACM workshop on Recurring malcode
Confidence-weighted linear classification

Proceedings of the 25th international conference on Machine learning
Learning and Classification of Malware Behavior

DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
LIBLINEAR: A Library for Large Linear Classification

The Journal of Machine Learning Research
McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables

ACSAC '08 Proceedings of the 2008 Annual Computer Security Applications Conference
Malware detection using statistical analysis of byte-level file content

Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics
FIRE: FInding Rogue nEtworks

ACSAC '09 Proceedings of the 2009 Annual Computer Security Applications Conference
Improving the efficiency of dynamic malware analysis

Proceedings of the 2010 ACM Symposium on Applied Computing
Automated classification and analysis of internet malware

RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
peHash: a novel approach to fast malware clustering

LEET'09 Proceedings of the 2nd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more
Behavioral clustering of HTTP-based malware and signature generation using malicious network traces

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Automatic generation of remediation procedures for malware infections

USENIX Security'10 Proceedings of the 19th USENIX conference on Security
JACKSTRAWS: picking command and control connections from bot traffic

SEC'11 Proceedings of the 20th USENIX conference on Security
New malicious code detection using variable length n-grams

ICISS'06 Proceedings of the Second international conference on Information Systems Security

Malware classification method via binary content comparison

Proceedings of the 2012 ACM Research in Applied Computation Symposium
MAST: triage for market-scale mobile malware analysis

Proceedings of the sixth ACM conference on Security and privacy in wireless and mobile networks
A static, packer-agnostic filter to detect similar malware samples

DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Tracking memory writes for malware classification and code reuse identification

DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment

Quantified Score

Hi-index	0.01

Visualization

Abstract

To handle the large number of malware samples appearing in the wild each day, security analysts and vendors employ automated tools to detect, classify and analyze malicious code. Because malware is typically resistant to static analysis, automated dynamic analysis is widely used for this purpose. Executing malicious software in a controlled environment while observing its behavior can provide rich information on a malware's capabilities. However, running each malware sample even for a few minutes is expensive. For this reason, malware analysis efforts need to select a subset of samples for analysis. To date, this selection has been performed either randomly or using techniques focused on avoiding re-analysis of polymorphic malware variants [41, 23]. In this paper, we present a novel approach to sample selection that attempts to maximize the total value of the information obtained from analysis, according to an application-dependent scoring function. To this end, we leverage previous work on behavioral malware clustering [14] and introduce a machine-learning-based system that uses all statically-available information to predict into which behavioral class a sample will fall, before the sample is actually executed. We discuss scoring functions tailored at two practical applications of large-scale dynamic analysis: the compilation of network blacklists of command and control servers and the generation of remediation procedures for malware infections. We implement these techniques in a tool called ForeCast. Large-scale evaluation on over 600,000 malware samples shows that our prototype can increase the amount of potential command and control servers detected by up to 137% over a random selection strategy and 54% over a selection strategy based on sample diversity.