Classification by pairwise coupling
NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
PolyUnpack: Automating the Hidden-Code Extraction of Unpack-Executing Malware
ACSAC '06 Proceedings of the 22nd Annual Computer Security Applications Conference
Generalized Bradley-Terry Models and Multi-Class Probability Estimates
The Journal of Machine Learning Research
Learning to Detect and Classify Malicious Executables in the Wild
The Journal of Machine Learning Research
Detecting Obfuscated Viruses Using Cosine Similarity Analysis
AMS '07 Proceedings of the First Asia International Conference on Modelling & Simulation
Bro: a system for detecting network intruders in real-time
SSYM'98 Proceedings of the 7th conference on USENIX Security Symposium - Volume 7
Renovo: a hidden code extractor for packed executables
Proceedings of the 2007 ACM workshop on Recurring malcode
Confidence-weighted linear classification
Proceedings of the 25th international conference on Machine learning
Learning and Classification of Malware Behavior
DIMVA '08 Proceedings of the 5th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
LIBLINEAR: A Library for Large Linear Classification
The Journal of Machine Learning Research
ACSAC '08 Proceedings of the 2008 Annual Computer Security Applications Conference
Malware detection using statistical analysis of byte-level file content
Proceedings of the ACM SIGKDD Workshop on CyberSecurity and Intelligence Informatics
ACSAC '09 Proceedings of the 2009 Annual Computer Security Applications Conference
Improving the efficiency of dynamic malware analysis
Proceedings of the 2010 ACM Symposium on Applied Computing
Automated classification and analysis of internet malware
RAID'07 Proceedings of the 10th international conference on Recent advances in intrusion detection
peHash: a novel approach to fast malware clustering
LEET'09 Proceedings of the 2nd USENIX conference on Large-scale exploits and emergent threats: botnets, spyware, worms, and more
Behavioral clustering of HTTP-based malware and signature generation using malicious network traces
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Automatic generation of remediation procedures for malware infections
USENIX Security'10 Proceedings of the 19th USENIX conference on Security
JACKSTRAWS: picking command and control connections from bot traffic
SEC'11 Proceedings of the 20th USENIX conference on Security
New malicious code detection using variable length n-grams
ICISS'06 Proceedings of the Second international conference on Information Systems Security
Malware classification method via binary content comparison
Proceedings of the 2012 ACM Research in Applied Computation Symposium
MAST: triage for market-scale mobile malware analysis
Proceedings of the sixth ACM conference on Security and privacy in wireless and mobile networks
A static, packer-agnostic filter to detect similar malware samples
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Tracking memory writes for malware classification and code reuse identification
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Hi-index | 0.01 |
To handle the large number of malware samples appearing in the wild each day, security analysts and vendors employ automated tools to detect, classify and analyze malicious code. Because malware is typically resistant to static analysis, automated dynamic analysis is widely used for this purpose. Executing malicious software in a controlled environment while observing its behavior can provide rich information on a malware's capabilities. However, running each malware sample even for a few minutes is expensive. For this reason, malware analysis efforts need to select a subset of samples for analysis. To date, this selection has been performed either randomly or using techniques focused on avoiding re-analysis of polymorphic malware variants [41, 23]. In this paper, we present a novel approach to sample selection that attempts to maximize the total value of the information obtained from analysis, according to an application-dependent scoring function. To this end, we leverage previous work on behavioral malware clustering [14] and introduce a machine-learning-based system that uses all statically-available information to predict into which behavioral class a sample will fall, before the sample is actually executed. We discuss scoring functions tailored at two practical applications of large-scale dynamic analysis: the compilation of network blacklists of command and control servers and the generation of remediation procedures for malware infections. We implement these techniques in a tool called ForeCast. Large-scale evaluation on over 600,000 malware samples shows that our prototype can increase the amount of potential command and control servers detected by up to 137% over a random selection strategy and 54% over a selection strategy based on sample diversity.