Using file relationships in malware classification

Authors:
Nikos Karampatziakis;Jack W. Stokes;Anil Thomas;Mady Marinescu
Affiliations:
Microsoft Corporation, Redmond, WA;Microsoft Corporation, Redmond, WA;Microsoft Corporation, Redmond, WA;Microsoft Corporation, Redmond, WA
Venue:
DIMVA'12 Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment
Year:
2012

Citing 17
Cited 0

Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Data Mining Methods for Detection of New Malicious Executables

SP '01 Proceedings of the 2001 IEEE Symposium on Security and Privacy
Static Analyzer of Vicious Executables (SAVE)

ACSAC '04 Proceedings of the 20th Annual Computer Security Applications Conference
Semantics-Aware Malware Detection

SP '05 Proceedings of the 2005 IEEE Symposium on Security and Privacy
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Learning to Detect and Classify Malicious Executables in the Wild

The Journal of Machine Learning Research
Exploring Multiple Execution Paths for Malware Analysis

SP '07 Proceedings of the 2007 IEEE Symposium on Security and Privacy
Behavior-based spyware detection

USENIX-SS'06 Proceedings of the 15th conference on USENIX Security Symposium - Volume 15
Statistical signatures for fast filtering of instruction-substituting metamorphic malware

Proceedings of the 2007 ACM workshop on Recurring malcode
Introduction to Information Retrieval

Introduction to Information Retrieval
McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables

ACSAC '08 Proceedings of the 2008 Annual Computer Security Applications Conference
IMAD: in-execution malware analysis and detection

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Large-scale malware indexing using function-call graphs

Proceedings of the 16th ACM conference on Computer and communications security
Synthesizing Near-Optimal Malware Specifications from Suspicious Behaviors

SP '10 Proceedings of the 2010 IEEE Symposium on Security and Privacy
Combining file content and file relations for cloud based malware detection

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Malware classification based on call graph clustering

Journal in Computer Virology
New malicious code detection using variable length n-grams

ICISS'06 Proceedings of the Second international conference on Information Systems Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

Typical malware classification methods analyze unknown files in isolation. However, this ignores valuable relationships between malware files, such as containment in a zip archive, dropping, or downloading. We present a new malware classification system based on a graph induced by file relationships, and, as a proof of concept, analyze containment relationships, for which we have much available data. However our methodology is general, relying only on an initial estimate for some of the files in our data and on propagating information along the edges of the graph. It can thus be applied to other types of file relationships. We show that since malicious files are often included in multiple malware containers, the system's detection accuracy can be significantly improved, particularly at low false positive rates which are the main operating points for automated malware classifiers. For example at a false positive rate of 0.2%, the false negative rate decreases from 42.1% to 15.2%. Finally, the new system is highly scalable; our basic implementation can learn good classifiers from a large, bipartite graph including over 719 thousand containers and 3.4 million files in a total of 16 minutes.